Data Labeling 101: An Introduction to Annotation Techniques for Computer Vision

SDT Inc.
6 min readMar 18, 2022
SDT-built Computer Vision Demo Robot for display in the Seoul AI Expo 2022

Computer vision is a subset within artificial intelligence (AI) that processes visual data such as digital images, videos, and other inputs to identify and categorize objects in order to analyze scenes and activities in real-life environments. Coupled with other industry 4.0 technologies, computer vision presents new and interesting applications in the field of IoT, namely being able to analyze vast amounts of data faster than humans and drive real-time decisions.

Computer Vision and IoT

SDT NodeV, an IoT camera that enables computer vision applications

IoT-enabled tasks like remote monitoring and predictive maintenance would not be possible without computer vision. Consider a computer vision system or application trained to inspect a product that can analyze thousands of assets or processes every minute, noting defects or issues imperceptible to the human eye. Coupled with IoT connectivity, the camera or application can then send instant notifications when a defect occurs. It’s for this reason that computer vision is being applied in industries like energy, utilities, automotive, healthcare, and manufacturing.

Data Labeling and Annotation Techniques

However, before applying computer vision solutions, it is necessary to build reliable AI models, and that begins with data labeling which is required for computer vision. Data annotation is the process of adding “labels” or tags to raw data such as images and videos. Once data is annotated, further context in the form of data labeling with metadata can be used to identify objects so that a machine learning model can learn from it. In other words, data labeling is necessary for training machine learning models so it can recognize only the important parts of an image (known as classes) in even new, never-before-seen images.

Whether a business is looking to improve their computer vision models for general object detection, facial recognition, or movement prediction, high-quality annotated image data is needed. Therefore, every computer vision project requires data labeling, which means image data is tagged with different types of annotation and techniques. This blog will explore various annotation techniques for data labeling including bounding boxes, polygons, landmarking, polylines, tracking, and transcription:

● 2-D/3-D Bounding Boxes

Bounding box annotation on stop sign

Image annotation involves one or more data labeling techniques, and often different annotation shapes are used to annotate an image. Bounding boxes are one of the most commonly used annotation shapes in computer vision and can be either two-dimensional (2D) or three-dimensional (3D). It is predominantly used to outline the location of the object in a symmetrical image, which is important when creating a machine learning model that identifies what parts of the image to look at — for example, drawn boxes around products on a grocery store shelf to train an AI model to help with restocking alerts or for employee-less checkout and security. Smart retail is one of the critical use cases for computer visions as supply chain management has become more complicated.

It’s important for image classification and localization models to be trained with the right amount of data and accurate annotations, and bounding boxes is one important data labeling technique to do just that (more examples and best practices of bounding boxes can be found here).

● Polygons

Polygon annotation

Polygons, or polygonal segmentation, consider complex shapes, so they are better able to annotate irregular objects within an image than bounding boxes. Unlike other image annotation techniques for shapes, it annotates the edges of objects and marks the vertices of the target object. Thus, it eliminates unrelated pixels from the label. Polygons are useful for detecting objects such as symbols and logos, and are good at detecting abnormalities in product manufacturing to increase quality control compared to the fallibility of the human eye. Polygon annotation enables computer vision across diverse applications such as those in autonomous driving, drones and satellites, and agriculture.

● Landmarking

Facial recognition using landmark annotation techniques

Like its name suggests, landmarking annotation places “landmarks,” or pivotal key points within an image or across multiple frames of videos to label and trace an object’s movement. Landmarking has been especially useful in facial recognition since it marks the key points in facial expressions and gestures. For example, marking the pupils and points along the edge of a smiling mouth. It allows for the precise detection of differently sized faces for computer vision, more specifically to train a facial recognition application. Landmarking is also useful for predicting the motions of pedestrians for computer vision in self-driving cars.

● Polylines

Polylines annotate images with continuous lines or edges (i.e., open shapes). This annotation technique is often used within autonomous vehicles applications to identify lines on the road or on the edge of the sidewalk or for determining routes and roads on maps. It trains a machine learning model to identify these physical boundaries. Real-world examples include city, town, or uneven ground scenarios that demonstrate how polylines assist to maintain accurate lane detection and help autonomous vehicles understand drivable areas.

● Object Tracking

Object tracking is a data labeling technique that plots an object’s movement across multiple frames of a video. Many times, the process of identifying an object frame by frame of a video is automated. Video annotation, inclusive of object tracking and activity tracking, addresses many security and safety measures that is too important to be left to the error rate of human tracking. On a construction site, object tracking enables notification of heavy equipment movements to prevent accidents or compliance with wearing safety gear as employees move from place to place. It can also be used to detect pedestrian and traffic movement for smart city infrastructure. Again, tracking can be used as a security backup for loss prevention in stores.

● Transcription

Transcription is used when annotating text in images or videos. For example, when annotating a company logo that has text inscribed on it, transcription as a data labeling technique is ideal. A recent customer case with SDT actually used transcription to read labels in a shipyard. At the time, crane operators were too far away to see the labels on stacks and so they had to manually mark each pallet on a notebook every time one was moved, leading to delays and errors. Automatically detecting and transcribing the labels allowed them to access all that data on a centralized online system and get rid of the constant paper notetaking. Another enterprise came to SDT asking for help building covid 19 vaccine fridges with computer vision transcription. They wanted to enable the fridge to automatically read the labels on the vaccine vials in order to automatically track the rapid expiration dates and reduce wastage.

Conclusion

For computer vision applications, large and diverse amounts of images are required to train machine learning algorithms and models, which often make data labeling and annotations a challenge to produce. Practically, it makes sense to use a variety of annotation techniques when labeling data. For example, this license plate detector uses a combination of the techniques outlined in this blog.

Data for computer vision is typically annotated by humans and therefore can get overly complicated in real life, so it, too, is not without its challenges. Want to learn more?

Check out these resources we’ve curated:

● Top open-source annotation tools: https://www.toolbox.com/tech/artificial-intelligence/articles/top-open-source-data-annotation-tools/

● Polygon Annotation: https://github.com/zhong110020/labelme

● Open-source Object Tracking and Computer Vision Mega-Library: https://opencv.org/

● COCO, Pascal VOC, and YOLO Annotation Formats: https://towardsdatascience.com/image-data-labelling-and-annotation-everything-you-need-to-know-86ede6c684b1

● Open source image annotator: https://www.robots.ox.ac.uk/~vgg/software/via/

● Object Detection and Tracking: https://github.com/yehengchen/Object-Detection-and-Tracking

Contact us for more information on Smart Factory and Smart Retail computer vision solutions with smart cameras, data annotation, and automation at sdt.inc or LinkedIn.

--

--

SDT Inc.

Inspire & Connect. SDT is a provider of industrial digital transformation solutions and quantum-grade devices.