APPROACH TO THE AUTOMATIC CREATION OF AN ANNOTATED DATASET FOR THE DETECTION, LOCALIZATION AND CLASSIFICATION OF BLOOD CELLS IN AN IMAGE

S M Kovalenko,S V Kovalenko,A S Kovalenko,O S Kutsenko

doi:10.15588/1607-3274-2024-1-12

Abstract

Context. The paper considers the problem of automating the creation of an annotated dataset for further use in a system for detecting, localizing and classifying blood cells in an image using deep learning. The subject of the research is the processes of digital image processing for object detection and localization. Objective. The aim of this study is to create a pipeline of digital image processing methods that can automatically generate an annotated set of blood smear images. This set will then be used to train and validate deep learning models, significantly reducing the time required by machine learning specialists. Method. The proposed approach for object detection and localization is based on digital image processing methods such as filtering, thresholding, binarization, contour detection, and filling. The pipeline for detection and localization includes the following steps: The given fragment of text describes a process that involves noise reduction, conversion to the HSV color model, defining a mask for white blood cells and platelets, detecting the contours of white blood cells and platelets, determining the coordinates of the upper left and lower right corners of white blood cells and platelets, calculating the area of the region inside the bounding box, saving the obtained data, and determining the most common color in the image; filling the contours of leukocytes and platelets with said color; defining a mask for red blood cells; defining the contours of red blood cells; determining the coordinates of the upper left and lower right corners of red blood cells; calculating the area of the region within the bounding box; entering data about the found objects into the dataframe; saving to a .csv file for future use. With an unlabeled image dataset and a generated .csv file using image processing libraries, any researcher should be able to recreate a labeled dataset. Results. The developed approach was implemented in software for creating an annotated dataset of blood smear images Conclusions. The study proposes and justifies an approach to automatically create a set of annotated data. The pipeline is tested on a set of unlabelled data and a set of labelled data is obtained, consisting of cell images and a .csv file with the attributes “file name”, “type”, “xmin”, “ymin”, “xmax”, “ymax”, “area”, which are the coordinates of the bounding box for each object. The number of correctly, incorrectly, and unrecognised objects is calculated manually, and metrics are calculated to assess the accuracy and quality of object detection and localisation.

Full Text