Collecting Data

First step in reaching your goal is to collect sufficient amount of data - in this case, images. You will use these images to train your model. Note that you have a limit of free space for your images. You can find this info by pressing your User profile at the top right corner of the window.

How to use sentisight platform

Upload images to the system:
Press Browse and choose your training images. If you are preparing images for classification, labeling can be done while uploading them. You will find more information about labeling in the second step - Data labeling

Learn more about the process

  • How many images do I have to collect?

    To train an algorithm, data scientists traditionally use huge databases - tens, hundreds of thousands, or even millions of images. The more labeled images you have, the better results you may expect. Fortunately, there are techniques that allow you to do with a much smaller amount of data, yet able to yield satisfactory performance.

    We recommend providing at least:
    - 30 images per class for single-label image classification
    - 50 images per class for multi-label image classification.

    Note: the system will not start training a model unless there are more than 15 images per class for any type of model.

    Minimum number of images is given for reference only. In some cases, this will be enough for training a rather decent model, but that depends on your target. We strongly recommend using as much data as you can collect instead of focusing on the bare minimum required.

  • How about the image size and quality?

    It is tempting to think that better image quality and bigger size will yield superior results. However, this is not necessarily true. AI algorithms are very complex and the training process may take a long time. Since image size is one of the factors that can make this procedure even longer, for training we should better use as small pictures as possible. On the other hand, your pictures should be zoomed in at the target objects as closely as possible, so it is easier for AI to identify them. A good rule of thumb is that if you cannot recognize the object in the picture yourself, you should not expect this from AI either.

    Another thing to know is that for the model’s training session all images should be of the same size. However, you should not worry about this as we took care of that. During the training, the images are rescaled automatically to the appropriate size (we use 299x299 images for the classification models). Additionally, on the SentiSight.ai platform, you can change the default image size setting in which case the mages will be rescaled during upload, so you will save some disk space.

    Example

    Scalling
    Suppose the size of this image is 860x600
    Scalling
    • Since we cannot train models on images of this size, it will be rescaled.
    • Suppose we want to detect people in the image that is a bit blurred. Despite this blur, we could still see the people in the original image. However, after rescaling some of them will not be identifiable
    • Therefore, you must decide what size of images you will use and be aware of rescaling.
  • What images should I choose?

    You can consider this as if you were trying to teach another person to recognize some image content. Obviously, some images may appear of unsufficient quality, too small, or sometimes even misleading. As previously mentioned, if you can identify an object in the picture correctly without any prior knowledge of the context, then your model will benefit from that image. Otherwise, many misleading images will probably reduce the accuracy.

    Example

    Suppose this is an image
    from your training data

    WRONG1. This is obviously not a real chicken, but we sometimes may call it that way. You should not label it as such if you want your algorithm to recognize real birds.

    WRONG2. You could probably assume that this is a person (because you know it is a costume and people do dress like that). The algorithm cannot have this kind of context knowledge unless you explicitly train it for that, which means even more complex algorithms... So if you want your model to be able of recognizing people, avoid giving it this image as an example of a person.

    CORRECT: you should better label it as a chicken costume, or not use it at all.
    Note: there really might be specific situations where you choose to call it a person or a chicken, but you must be aware of how that works.