Imbalanced Binary Classification Research Articles

For imbalanced classification, data-level methods can achieve inter-class balance, but the samples generated do not contain new information and cannot avoid the problem of introducing noise. Algorithm-level methods may lead to overfitting of the model, and its classification effect is more dependent on the specific dataset and classification task, which means they lack universality. In addition, how to deeply mine the differences in the distribution of data overlap areas, and how to effectively mine the differences between categories when the absolute number of minority samples is small, are also important challenges in imbalanced classification. This paper proposes an imbalanced binary classification method using multi-label confidence comparisons based on contrastive learning. Different from the previous idea of directly learning its distribution characteristics from minority samples, combined with the idea of contrastive learning, the classification task is redefined as the multi-label matching task by mining the deep features that can represent the commonality and difference between the neighboring samples. Multiple differentiated contrastive sample groups are obtained through random sampling in its neighbor sample pool for each sample. This sample is combined with its contrastive sample groups to form multiple sample-neighbor pairs as training samples in the multi-label matching task. The original dataset is multiplied without introducing noise, laying a foundation for the effective mining of class differences when the absolute number of minority class samples is small. Based on the corresponding reconstruction error generated by Variational AutoEncoder (VAE), for sample-neighbor pairs, a multi-label matching loss between target samples and contrastive sample groups that integrates the idea of contrastive learning is designed. a robust classifier is obtained through simultaneous iterative learning of reconstruction error and multi-label matching loss, which can better mine the distribution differences of overlapping regions. In the testing phase, multiple different contrastive sample groups and the corresponding prediction results of the samples to be classified are obtained, which categories can be judged by integrating the predictions of each group for reverse reasoning. Experimental results on 38 public datasets show that the method outperforms typical imbalanced classification methods in both F1-measure and G-mean.

Read full abstract

Small ribonucleic acid (sRNA) sequences are 50–500 nucleotide long, noncoding RNA (ncRNA) sequences that play an important role in regulating transcription and translation within a bacterial cell. As such, identifying sRNA sequences within an organism’s genome is essential to understand the impact of the RNA molecules on cellular processes. Recently, numerous machine learning models have been applied to predict sRNAs within bacterial genomes. In this study, we considered the sRNA prediction as an imbalanced binary classification problem to distinguish minor positive sRNAs from major negative ones within imbalanced data and then performed a comparative study with six learning algorithms and seven assessment metrics. First, we collected numerical feature groups extracted from known sRNAs previously identified in Salmonella typhimurium LT2 (SLT2) and Escherichia coli K12 (E. coli K12) genomes. Second, as a preliminary study, we characterized the sRNA-size distribution with the conformity test for Benford’s law. Third, we applied six traditional classification algorithms to sRNA features and assessed classification performance with seven metrics, varying positive-to-negative instance ratios, and utilizing stratified 10-fold cross-validation. We revisited important individual features and feature groups and found that classification with combined features perform better than with either an individual feature or a single feature group in terms of Area Under Precision-Recall curve (AUPR). We reconfirmed that AUPR properly measures classification performance on imbalanced data with varying imbalance ratios, which is consistent with previous studies on classification metrics for imbalanced data. Overall, eXtreme Gradient Boosting (XGBoost), even without exploiting optimal hyperparameter values, performed better than the other five algorithms with specific optimal parameter settings. As a future work, we plan to extend XGBoost further to a large amount of published sRNAs in bacterial genomes and compare its classification performance with recent machine learning models’ performance.

Read full abstract

Imbalanced Binary Classification Research Articles

Related Topics

Articles published on Imbalanced Binary Classification

Fixing imbalanced binary classification: An asymmetric Bayesian learning approach.

Calibration methods in imbalanced binary classification

Limitations in Evaluating Machine Learning Models for Imbalanced Binary Outcome Classification in Spine Surgery: A Systematic Review.

Cost-Sensitive Online Adaptive Kernel Learning for Large-Scale Imbalanced Classification

DynaQ: online learning from imbalanced multi-class streams through dynamic sampling

Advanced Genetic Programming vs. State-of-the-Art AutoML in Imbalanced Binary Classification

Multimodal Classification of Anxiety Based on Physiological Signals

Switching synthesizing-incorporated and cluster-based synthetic oversampling for imbalanced binary classification

Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks

Global reliable data generation for imbalanced binary classification with latent codes reconstruction and feature repulsion

An imbalanced binary classification method via space mapping using normalizing flows with class discrepancy constraints

An improved GEV boosting method for imbalanced data classification with application to short-term rainfall prediction

Class-specific extreme learning machine based on overall distribution for addressing binary imbalance problem

Imbalanced binary classification under distribution uncertainty

Affinity based fuzzy kernel ridge regression classifier for binary class imbalance learning

An imbalanced binary classification method based on contrastive learning using multi-label confidence comparisons within sample-neighbors pair

Predicting the likelihood of airspace user rerouting to mitigate air traffic flow management delay

Equity‐weighted bootstrapping: Examples and analysis

Prediction of Bacterial sRNAs Using Sequence-Derived Features and Machine Learning.

Density Weighted Twin Support Vector Machines for Binary Class Imbalance Learning

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Imbalanced Binary Classification Research Articles

Related Topics

Articles published on Imbalanced Binary Classification

Fixing imbalanced binary classification: An asymmetric Bayesian learning approach.

Calibration methods in imbalanced binary classification

Limitations in Evaluating Machine Learning Models for Imbalanced Binary Outcome Classification in Spine Surgery: A Systematic Review.

Cost-Sensitive Online Adaptive Kernel Learning for Large-Scale Imbalanced Classification

DynaQ: online learning from imbalanced multi-class streams through dynamic sampling

Advanced Genetic Programming vs. State-of-the-Art AutoML in Imbalanced Binary Classification

Multimodal Classification of Anxiety Based on Physiological Signals

Switching synthesizing-incorporated and cluster-based synthetic oversampling for imbalanced binary classification

Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks

Global reliable data generation for imbalanced binary classification with latent codes reconstruction and feature repulsion

An imbalanced binary classification method via space mapping using normalizing flows with class discrepancy constraints

An improved GEV boosting method for imbalanced data classification with application to short-term rainfall prediction

Class-specific extreme learning machine based on overall distribution for addressing binary imbalance problem

Imbalanced binary classification under distribution uncertainty

Affinity based fuzzy kernel ridge regression classifier for binary class imbalance learning

An imbalanced binary classification method based on contrastive learning using multi-label confidence comparisons within sample-neighbors pair

Predicting the likelihood of airspace user rerouting to mitigate air traffic flow management delay

Equity‐weighted bootstrapping: Examples and analysis

Prediction of Bacterial sRNAs Using Sequence-Derived Features and Machine Learning.

Density Weighted Twin Support Vector Machines for Binary Class Imbalance Learning