Active semi-supervised learning for biological data classification.

Guilherme Camargo,Pedro H Bugatti,Priscila T M Saito

doi:10.1371/journal.pone.0237428

Guilherme Camargo, Pedro H Bugatti + Show 1 more

Open Access

https://doi.org/10.1371/journal.pone.0237428

Copy DOI

Abstract

Due to datasets have continuously grown, efforts have been performed in the attempt to solve the problem related to the large amount of unlabeled data in disproportion to the scarcity of labeled data. Another important issue is related to the trade-off between the difficulty in obtaining annotations provided by a specialist and the need for a significant amount of annotated data to obtain a robust classifier. In this context, active learning techniques jointly with semi-supervised learning are interesting. A smaller number of more informative samples previously selected (by the active learning strategy) and labeled by a specialist can propagate the labels to a set of unlabeled data (through the semi-supervised one). However, most of the literature works neglect the need for interactive response times that can be required by certain real applications. We propose a more effective and efficient active semi-supervised learning framework, including a new active learning method. An extensive experimental evaluation was performed in the biological context (using the ALL-AML, Escherichia coli and PlantLeaves II datasets), comparing our proposals with state-of-the-art literature works and different supervised (SVM, RF, OPF) and semi-supervised (YATSI-SVM, YATSI-RF and YATSI-OPF) classifiers. From the obtained results, we can observe the benefits of our framework, which allows the classifier to achieve higher accuracies more quickly with a reduced number of annotated samples. Moreover, the selection criterion adopted by our active learning method, based on diversity and uncertainty, enables the prioritization of the most informative boundary samples for the learning process. We obtained a gain of up to 20% against other learning techniques. The active semi-supervised learning approaches presented a better trade-off (accuracies and competitive and viable computational times) when compared with the active supervised learning ones.

Highlights

The amount of information available has been increasing, due to new means of acquisition, increased storage capacity and speed of communication, producing large datasets
This paper proposes a more effective and efficient learning approach to cope with: i) a higher proportion of unlabeled data; ii) scarcity of labeled data; iii) the need for a significant amount of data labeled by a specialist to obtain high accuracies by the classifiers; iv) difficulty in obtaining annotations made by a specialist; v) the need for interactive response times for the learning process
We evaluate the performance of the classifiers, with the use of active learning strategies, performing comparisons between the selection strategies (Rand, Cluster Rand (Clu), Increasing Boundary Edges (IBE), Root Distance-based Sampling (RDS)) described in Section 1.2 and our proposed Root Distance Boundary Sampling (RDBS) selection strategy

Summary

Introduction

The amount of information available has been increasing, due to new means of acquisition, increased storage capacity and speed of communication, producing large datasets. By combining active learning (AL) and semi-supervised learning (SSL) techniques, it would be possible to select the most significant samples from the dataset. They enable to compose the labeled training set and propagate their labels to the unlabeled training set, constructing a more robust classifier. A smaller number of more informative samples previously selected (by our active learning strategy) and labeled by the specialist can more effectively (i.e. with fewer errors) propagate the labels to a set of unlabeled data (through the semi-supervised strategy). We do not need that the specialist spends time and effort to label a large dataset

Active semi-supervised learning paradigm

Active learning strategies

Proposed framework and implementation

Datasets

Scenarios

Results and discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: Aug 19, 2020
Citations: 20	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Active semi-supervised learning for biological data classification.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

DSAL: Deeply Supervised Active Learning From Strong and Weak Labelers for Biomedical Image Segmentation.
Ziyuan Zhao ... Cuntai Guan
IEEE Journal of Biomedical and Health Informatics | VOL. 25
Ziyuan Zhao, et. al.Ziyuan Zhao ... Cuntai Guan
18 Jan 2021
IEEE Journal of Biomedical and Health Informatics | VOL. 25

Efficient Active Learning by Querying Discriminative and Representative Samples and Fully Exploiting Unlabeled Data.
Bin Gu ... Zhou Zhai
IEEE Transactions on Neural Networks and Learning Systems | VOL. 32
Bin Gu, et. al.Bin Gu ... Zhou Zhai
26 Aug 2020
IEEE Transactions on Neural Networks and Learning Systems | VOL. 32

A novel myocardial infarction localization method using multi-branch DenseNet and spatial matching-based active semi-supervised learning
Ziyang He ... Sara A Althubiti
Information Sciences | VOL. 606
Ziyang He, et. al.Ziyang He ... Sara A Althubiti
25 May 2022
Information Sciences | VOL. 606

Combining active and semi-supervised learning for spoken language understanding
Gokhan Tur ... Robert E Schapire
Speech Communication | VOL. 45
Gokhan Tur, et. al.Gokhan Tur ... Robert E Schapire
30 Oct 2004
Speech Communication | VOL. 45

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Active semi-supervised learning for biological data classification.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE