A novel logistic regression model combining semi-supervised learning and active learning for disease classification

Hua Chai,Sai Wang,Hai-Wei Shen,Yong Liang

doi:10.1038/s41598-018-31395-5

Hua Chai, Sai Wang + Show 2 more

Open Access

https://doi.org/10.1038/s41598-018-31395-5

Copy DOI

Journal: Scientific Reports	Publication Date: Aug 29, 2018
Citations: 17	License type: open-access

Affiliation: Macau University of Science and Technology

Abstract

Traditional supervised learning classifier needs a lot of labeled samples to achieve good performance, however in many biological datasets there is only a small size of labeled samples and the remaining samples are unlabeled. Labeling these unlabeled samples manually is difficult or expensive. Technologies such as active learning and semi-supervised learning have been proposed to utilize the unlabeled samples for improving the model performance. However in active learning the model suffers from being short-sighted or biased and some manual workload is still needed. The semi-supervised learning methods are easy to be affected by the noisy samples. In this paper we propose a novel logistic regression model based on complementarity of active learning and semi-supervised learning, for utilizing the unlabeled samples with least cost to improve the disease classification accuracy. In addition to that, an update pseudo-labeled samples mechanism is designed to reduce the false pseudo-labeled samples. The experiment results show that this new model can achieve better performances compared the widely used semi-supervised learning and active learning methods in disease classification and gene selection.

Highlights

Identifying disease related genes and classifying the disease type using gene expression data is a very hot topic in machine learning
The experiments show our method can achieve a better accuracy than the active learning (AL) and supervised learning (SSL) logistic regression models
The novel logistic regression model is designed based on the complementarity of semi-supervised learning and active learning

Summary

Introduction

Identifying disease related genes and classifying the disease type using gene expression data is a very hot topic in machine learning Many different models such as logistic regression model[1] and support vector machines (SVM)[2] have been applied in this area. AL tries to train an accurate prediction model with minimum cost of labeling the unlabeled samples manually It selects most uncertain or informative unlabeled samples and annotates them by human experts. These labeled samples are included to the training dataset to improve the model performance. Though AL reduces the manpower work, manually labeling the selected samples by AL in biological experiments still cost much In another way, SSL uses unlabeled data together with labeled data in the training process without any manual labeling. The recent study[15] proposed by Lin designed a new active self-paced learning mechanism which combines the AL and SSL for face recognition

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A novel logistic regression model combining semi-supervised learning and active learning for disease classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

Combining active and semi-supervised learning for spoken language understanding
Gokhan Tur ... Robert E Schapire
Speech Communication | VOL. 45
Gokhan Tur, et. al.Gokhan Tur ... Robert E Schapire
30 Oct 2004
Speech Communication | VOL. 45

Semi-Supervised Active Learning for Object Detection
Sijin Chen ... Yan Hua
Electronics | VOL. 12
Sijin Chen, et. al.Sijin Chen ... Yan Hua
11 Jan 2023
Electronics | VOL. 12

A semi-supervised deep learning method in network intrusion detection
Zhanbo Li ... Pavel Loskot
-
Zhanbo Li, et. al.Zhanbo Li ... Pavel Loskot
01 Jun 2023
01 Jun 2023

Combination of Active Learning and Semi-Supervised Learning under a Self-Training Scheme
Nikos Fazakis ... Sotiris Kotsiantis
Entropy | VOL. 21
Nikos Fazakis, et. al.Nikos Fazakis ... Sotiris Kotsiantis
10 Oct 2019
Entropy | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A novel logistic regression model combining semi-supervised learning and active learning for disease classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports