The current state of the art in training dataset development for deep learning systems is manual annotation. AI model training is currently done manually around the world with millions of people involved in the task. Meanwhile AI scientists wait for data with poor services provided and while their work sits in limbo, budgets change, their projects get canceled and new initiatives cannot be fulfilled because they are too expensive. Only 10% of new AI initiatives are risked and only 5% of those come to fruition. The market needs the machine to train the machine.
Early on Clear Image AI made the decision to create automation services that would reduce as much as possible manual annotation.
We provide the fastest and most cost effective automated data annotation service to create training data sets for AI projects.
The auto training pipeline is based on the human in the loop pipeline but substituting manual contribution with algorithms
Manual teams can integrate our services directly into their manual editor or if that is not possible they can use our editor which is fully integrated.
Our pipeline is designed to connect different modules as plug and play components. This allows our system complex definitions of the client requirements by easily composing different annotation modules.
There are two things that manual teams are better at than machine learning: They need almost no training and they can do complex annotation tasks beyond the simple object recognition.
Our system is designed to emulate human versatility. First, to create our datasets while we train our systems to a new object we use a combination of algorithms that allow us to annotate those objects without having been trained before. While we use an autoML pipeline to avoid the need to use data scientists to create a new trained model. That way we circumvent the steep learning curve.
Second we have created a composable pipeline that allows us to program on the fly complex combinations of models to automatically produce context dependent datasets that would normally take weeks to create.
We have created an environment that succeeds at emulating human versatility when it comes to datasets creation.
Our dataset annotation pipeline is based on the industry standard human in the loop, where several teams of people produce initial manual datasets to train a model so that the model can then take their the task. There are three steps on the development of a training dataset for deep learning models that usually requires a human team: the production of ground truth data, the initial raw data annotations to start training the model, the training of the model itself and the review of the results from the model once it starts annotating images.
The highest risk in this kind of completely automated setups is controlling the errors that occur in the model as it is being trained. In other words, how do we find the errors that the model has produced without a human review? We have developed a set of algorithms which perform that functionality automatically that are highly accurate and have proven to be quite reliable.
To allow machine learning scientists and manual annotation teams to take advantage of our services we have created an open API that can be integrated through a REST service with their existing manual annotation editor. The services are designed to eliminate the need to annotate most of the images manually and let the user review them.
If the manual team does not have their own editor we have made available an open source editor that they can use so that they can take advantage of the work reduction.
Our system also has a client account manager, a project manager, a results manager and a project request form so that the users have as little friction using our system as possible.
Use our logic editor to combine automatically pre-trained models and autogenerated models to tackle complex annotation projects. Combine the classes that you want to box or segment using a simple editor or through json format and our API will create the complex dataset by chaining together different models or training a new one if necessary.