Semi-automatic data annotation based on feature-space projection and local quality metrics: An application to cerebral emboli characterization.

Yamil Vindas,Emmanuel Roux,Philippe Delachartre,Blaise Kévin Guépié,Marilys Almar

doi:10.1016/j.media.2022.102437

Abstract

We propose a semi-supervised learning approach to annotate a dataset with reduced requirements for manual annotation and with controlled annotation error. The method is based on feature-space projection and label propagation using local quality metrics. First, an auto-encoder extracts the features of the samples in an unsupervised manner. Then, the extracted features are projected by a t-distributed stochastic neighbor embedding algorithm into a two-dimensional (2D) space. A selection of the best 2D projection is introduced based on the silhouette score. The expert annotator uses the obtained 2D representation to manually label samples. Finally, the labels of the labeled samples are propagated to the unlabeled samples using a K-nearest neighbor strategy and local quality metrics. We compare our method against semi-supervised optimum-path forest and K-nearest neighbor label propagation (without considering local quality metrics). Our method achieves state-of-the-art results on three different datasets by labeling more than 96% of the samples with an annotation error from 7% to 17%. Additionally, our method allows to control the trade-off between annotation error and number of labeled samples. Moreover, we combine our method with robust loss functions to compensate for the label noise introduced by automatic label propagation. Our method allows to achieve similar, and even better, classification performances compared to those obtained using a fully manually labeled dataset, with up to 6% in terms of classification accuracy.

Full Text