There are three types of image recognition algorithms that are most commonly used: image classification, object detection, and semantic/instance segmentation. All of these are so called “supervised” algorithms, meaning that in order to train them, first you need to have a labeled set of images. The type of image labeling that you need, corresponds to the type of the algorithm. In other words, the information you give the algorithm as an input is the same type of information you should expect to get as an output once the model is trained.

Below, we briefly look through examples of the aforementioned three types of the algorithms.

  • 1. Image Classification - “What is in the image?”

    Here, each image is labeled with a class it belongs to. This is called single-label classification.


    You may also want to find more than one object class in the image. This is called multi-label classification.


  • 2. Object detection - “What is in the image and where?”

    Here, the task is not only to predict what kind of object is in the image, but also to estimate the coordinates of a rectangular box around the object. Object detection is in a way similar to “multi-label” classification, because we may find several classes of objects in the same image, or even several instances of the same object class.


  • 3. Semantic/Instance segmentation - “What is in the image and where exactly?”

    Here, the goal is to predict the exact contours of the object. In instance segmentation, only the object we are looking for is labeled, while in semantic segmentation every pixel of the image is labeled.

    This kind of task is popular in image manipulation, also making special effects in video, when some object has to be cut out from an image or video.