4. Add new classes of objects to an existing AI model

Adapting pretrained models to include new custom classes is simple with TAO Toolkit.

4.1 Challenges with adding new classes to an existing model

When you start with a publicly available model, it is likely to have different classes than what your application desires. Customizing a model to add or remove classes requires you to know and understand the model architecture. This means going through thousands of lines of code to make the required changes and would involve changing the model template, dataloader, and loss function. One of the transfer learning tasks is to add new classes or remove a few classes from the existing model.

Take an example of adding a helmet class to an existing model that detects people. The task is to detect both people and helmets. When you train this new model to add a helmet class, you must provide a properly annotated dataset that has both people and helmets. If you only provide helmet data and not people data, the model performs well on helmets but poorly on people. Generally speaking, in transfer learning when adding new classes, you need representative data that covers the existing classes along with new classes.

4.2 Adding new classes in the TAO Toolkit

NVIDIA pretrained models help infer and generate labels on common objects on which they have been pretrained. This only leaves you with the task of labeling the custom classes needed for your application. The model can then be trained on the entire dataset to maintain its original functionality while also being able to work with the custom classes.

If we take the people and helmet example from earlier, we can use the inference from the NVIDIA PeopleNet model to annotate people and faces if we only have a labeled dataset for helmets. The NVIDIA PeopleNet model was trained on millions of face and person images, so these classes do not need to be learned from scratch. We first do an inference with PeopleNet and create person and face labels and then integrate the inferred labels with the existing helmet class labels.

Note: If you use PeopleNet to generate ground truth for people and face class, be careful of false positives or false negatives for these classes. Some manual cleanup might be required.

Given that the PeopleNet model already detects people and faces with a high degree of accuracy, you only need to train the helmet class. To do this, in the training specification, add a higher class weight for the helmet class and a lower class weight for person and face. This helps the model train more accurately on the new class while also maintaining the model’s ability to detect people and faces.

4.3 Results

For this task, we used an open source helmet detection dataset with 611 images for training and 152 images for validation. The training was configured to place a weight of 0.8 on the new helmet class and 0.1 on the people and face class. This is to help the model focus on learning the new helmet class. The PeopleNet model was then trained with the people, helmet, and face dataset and reached 80% AP for the helmet class within 100 epochs.

Figure 9. Accuracy of Helmet class after re-training
Figure 9. Accuracy of Helmet class after re-training
Figure 10. Inference with person, face, helmet
Figure 10. Inference with person, face, helmet

The general idea of taking a pretrained model, inferencing your dataset with it, and then retraining it to include custom classes along with the original classes is not limited to PeopleNet. Use the technique with any pretrained model in TAO.

This task was conducted on the Kaggle Helmet Detection Dataset. The complete task implementation with a step-by-step guide is available in the TAO Tasks GitHub repo.


3 https://www.kaggle.com/andrewmvd/helmet-detection