Object detection training tutorial - SentiSight.ai

Object detection training tutorial

Topics covered:
  • Basics of bounding box labeling
  • Selecting parameters
  • Training object detection model
  • Analyzing learning curve
  • Analyzing statistics and predictions
  • Analyzing precision-recall curve
  • Changing score thresholds
  • Downloading model or using it online

Note: now the default training time for object detection models depends on the number of different classes in the training set (1-2 classes: 2 hours, 3-5 classes: 3 hours, 6-10 classes: 6 hours, 11+ classes: 12 hours)

You can download video tutorial here

Object Detection Tutorial Video Transcription

  • Creating a dataset of labeled images

    To begin training an object detection model, you will first need a dataset of labeled images. If you do not have this dataset available, you will need to upload the images and label the images yourself.
  • Train your object detection model

    Once the images are labeled, you can start to train your object detection model. Start this by clicking on ‘Train’ from the top menu, and selecting Object Detection.
  • Selecting and understanding the parameters

    Here, you can set the model name, training time, and the stop time which determines for how long the model is going to continue training if there is no improvement. The improvement is measured by mean Average Precision (mAP) which is a common metric in object detection.

    The standard training time for an object detection model is significantly longer than for a classification model.

    The default training time for object detection models depends on the number of different classes in the training set (1-2 classes: 2 hours, 3-5 classes: 3 hours, 6-10 classes: 6 hours, 11+ classes: 12 hours.

    The label count is used to show which labels will be used to train the model. You can unselect any of the tick boxes if you do not want those labels to be used in the object detection model. Some images will contain multiple labels, but in this case, only the selected label will be trained for.

    If you change to an advanced view, you can select more advanced parameters. These include;

    • Use user-defined validation set
    • Use unlabeled images as negative samples
    • Change the validation set size percentage
    • Model size. We usually recommend selecting large model, as the train time between small, medium and large is negligible, yet the accuracy is often higher for larger models. However, the inference speed is quicker for the smaller models so if your primary concern is inference speed rather than accuracy, you should go for a smaller model.
  • Understanding and analyzing the model performance

    You can track the progress of your object detection model in the Train models tab. After approximately 20 minutes of training, you can start to view the learning curve. On the left hand graph, you can see the train loss values in blue, and the validation loss values in green. On the right hand graph, you can see the mean Average Precision values for validation in green.

    We select the best model by choosing the highest mean average precision value for validation. The model selected is represented by the red dashed line. If you are happy with the chosen model at any stage, you can choose to stop training and keep the current model.

    Once the model training is finished, you can view the model performance by clicking on View training statistics. These statistics are divided into Train and Validation. Here you can see many statistics such as Precision, Recall, F1 and mAP. The statistics marked by a * represent measures that depend on the selected score threshold.

    In basic view, the Optimised score thresholds are automatically calculated for you. In advanced view, you can set these thresholds yourself.

    Users can view the actual predictions by clicking on view predictions. You will be able to see the ground root truth labels in a black bounding box on the top right hand of the box, and the object detection prediction in a blue bounding box with the label on the top left of the box. Only predictions above the prediction threshold will be displayed.

    SentiSight also calculates the overlap of the prediction and the ground root truth, called the IoU value (intersection over union). If the IoU value is above a set threshold, typically 50%, then the prediction is judged to be correct, otherwise it is judged to be incorrect. Correct predictions have the text labeled in green, and incorrect predictions are in red text. The actual colour of the bounding box is relative to the colour of the label, not the accuracy of the prediction. Please, notice that sometimes, labeling errors will lead to predictions being displayed as falsely incorrect.

    To be considered a correct prediction, the label of the prediction must match the label of the ground root truth bounding box. If they do not match, the IoU is calculated at 0.

    You can filter results to show images that are either correct or incorrect, or all, via the Show: filter in the top right. If at least one prediction is incorrect, or a label has been missed, it will count as an incorrect prediction.

  • Advanced parameters and statistics

    In the advanced view for model statistics, you can view the learning curves, set the intersection over union threshold, and to use either optimised or custom thresholds.

    The higher you set the score threshold, the lower the amount of predictions you will receive, as only the bounding boxes that exceed these thresholds will be displayed. However, as the threshold is higher, the ratio of correct predictions (precision) will be higher. If you lower the threshold, the recall will be higher, but the precision will be lower.

  • Analysing precision-recall curve

    The precision-recall curve shows the tradeoff between precision and recall for a specific label. You can hover your mouse over any point of the graph to see the specific trade off, as well as detailed figures of F1 and score threshold.

    Users can also decide whether to use the best model, or the most recently trained model. The best model is the one which has the highest mean average precision on validation set, whilst the last model is always the last checkpoint of the model.

  • Downloading the predictions

    You can download the predictions on both the train and validation sets. The download will be prepared in the background, and you will be notified when it is ready. Once the download is completed, you can redownload the dataset at any time by first clicking on the download button on the top right hand corner of the screen, and then choosing the model you want to download.

    The downloaded zip file will include predictions in JSON format, as well as the images with bounding boxes drawn on them.

  • Using the model online, or downloading for offline use

    You have two options to use the model once it is trained. The first option is to download the model and use it offline, free for a 30 day trial. Thereafter, you will require a licence. Please, note that the offline model requires the linux operating system. The second option is to use the model online, by clicking on the ‘make a new prediction’ on the web interface, or by following the instructions to use the model via REST API.

    To use the model, simply upload new images, and the predictions will be automatically made. Users can easily decide whether to use the best or the last model, and whether to use optimised or custom thresholds. You can then download the results as images with bounding boxes, or the results in JSON format.