Ensemble-based semi-supervised learning approaches for imbalanced splice site datasets

Ana Stanescu,Doina Caragea

doi:10.1109/bibm.2014.6999196

Abstract

Producing accurate classifiers depends on the quality and quantity of labeled data. The lack of labeled data, due to its expensive generation, critically affects the application of machine learning algorithms to biological problems. However, unlabeled data may be acquired relatively faster and in larger quantities thanks to current biochemical technologies, called Next Generation Sequencing. In such cases, when the number of labeled instances is overwhelmed by the number of unlabeled instances, semi-supervised learning represents a cost-effective alternative that can improve supervised classifiers by utilizing unlabeled data. In practice, data oftentimes exhibits imbalanced class distributions, which represents an obstacle for both supervised and semi-supervised learning. The problem of supervised learning from imbalanced datasets has been extensively studied, and various solutions have been proposed to produce classifiers with optimal performance on highly skewed class distributions. In the case of semi-supervised learning, there are not as many efforts aimed at the imbalance data problem. In this paper, we study several ensemble-based semi-supervised learning approaches for predicting splice sites, a problem for which the imbalance ratio is very high. We run experiments on five imbalanced datasets with the goal of identifying which variants are the most effective.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Ensemble-based semi-supervised learning approaches for imbalanced splice site datasets

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets.
Ana Stanescu ... Doina Caragea
BMC Systems Biology | VOL. Suppl 9 5
Ana Stanescu, et. al.Ana Stanescu ... Doina Caragea
01 Jan 2015
BMC Systems Biology | VOL. Suppl 9 5

Cost-Sensitive Learning for Imbalanced Bad Debt Datasets in Healthcare Industry
Donghui Shi ... Jian Guan
-
Donghui Shi, et. al. Donghui Shi ... Jian Guan
01 Jul 2015
01 Jul 2015

A real use case of semi-supervised learning for mammogram classification in a local clinic of Costa Rica.
Saul Calderon-Ramirez ... David Elizondo
Medical & Biological Engineering & Computing | VOL. 60
Saul Calderon-Ramirez, et. al.Saul Calderon-Ramirez ... David Elizondo
03 Mar 2022
Medical & Biological Engineering & Computing | VOL. 60

Noise-adaptive synthetic oversampling technique
Minh Thanh Vo ... Trang Nguyen
Applied Intelligence | VOL. 51
Minh Thanh Vo, et. al.Minh Thanh Vo ... Trang Nguyen
18 Mar 2021
Applied Intelligence | VOL. 51

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Ensemble-based semi-supervised learning approaches for imbalanced splice site datasets

Abstract

Talk to us

Similar Papers