A Boundary-Information-Based Oversampling Approach to Improve Learning Performance for Imbalanced Datasets.

Der-Chiang Li,Liang-Sian Lin,Yao-San Lin,Qi-Shi Shi

doi:10.3390/e24030322

Abstract

Oversampling is the most popular data preprocessing technique. It makes traditional classifiers available for learning from imbalanced data. Through an overall review of oversampling techniques (oversamplers), we find that some of them can be regarded as danger-information-based oversamplers (DIBOs) that create samples near danger areas to make it possible for these positive examples to be correctly classified, and others are safe-information-based oversamplers (SIBOs) that create samples near safe areas to increase the correct rate of predicted positive values. However, DIBOs cause misclassification of too many negative examples in the overlapped areas, and SIBOs cause incorrect classification of too many borderline positive examples. Based on their advantages and disadvantages, a boundary-information-based oversampler (BIBO) is proposed. First, a concept of boundary information that considers safe information and dangerous information at the same time is proposed that makes created samples near decision boundaries. The experimental results show that DIBOs and BIBO perform better than SIBOs on the basic metrics of recall and negative class precision; SIBOs and BIBO perform better than DIBOs on the basic metrics for specificity and positive class precision, and BIBO is better than both of DIBOs and SIBOs in terms of integrated metrics.

Highlights

Data is said to be imbalanced when one of its classes has many more examples than that of other classes
It can be deduced that the virtual samples created by boundary-information-based oversampler (BIBO) are near the real decision nodes on the decision tree
The comparison experiment proved that safe-information-based oversamplers (SIBOs) generating synthetic samples near safe areas improves the performance of spec and pre P and that danger-information-based oversamplers (DIBOs) generating synthetic samples near dangerous areas can improve the performance of rec and pre N

Summary

Introduction

Data is said to be imbalanced when one of its classes (majority class, negative class) has many more examples than that of other classes (minority class, positive class). Sun et al [18] turned an imbalanced dataset into multiple balanced sub-datasets and used them in base classifiers Another very common way type of ensemble learning is where it is combined with resampling techniques, such as SMOTEBagging [19], random balanceboost [20], and the synthetic oversampling ensemble [21]. SMOTE_IPE [27] is another combined resampling method It uses an iterative-partitioning filter [28] to remove noisy samples in both majority and minority classes to clean up boundaries and make them more regular.

Oversampling Techniques

Motivation

Boundary

Boundary Information

Procedure for the Boundary-Information-Based Oversampler

Procedure Begin

Strengths Analysis

Experiment

Evaluation

Dataset Description

The Simulated Datasets

The Real-World Datasets

Oversampler Performance Evalutation

Comparative Strengths Results

Performance Results

Comparative Results of Computational Complexity

An Example of Using the Proposed BIBO Method

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Boundary-Information-Based Oversampling Approach to Improve Learning Performance for Imbalanced Datasets.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy

Lead the way for us

Journal: Entropy	Publication Date: Feb 23, 2022
License type: CC BY 4.0

Similar Papers

<title>Relevance Feedback in Image Retrieval: A New Approach using Positive and Negative Examples</title>
Mohammed L Kherfi ... Djemel Ziou
-
Mohammed L Kherfi, et. al.Mohammed L Kherfi ... Djemel Ziou
20 Jan 2003
20 Jan 2003

A pairwise ranking based approach to learning with positive and unlabeled examples
Sundararajan Sellamanickam ... Priyanka Garg
-
Sundararajan Sellamanickam, et. al.Sundararajan Sellamanickam ... Priyanka Garg
24 Oct 2011
24 Oct 2011

Learning Disjunctive Multiplicity Expressions and Disjunctive Generalize Multiplicity Expressions From Both Positive and Negative Examples
Yeting Li ... Zixuan Chen
The computer journal | VOL. 66
Yeting Li, et. al.Yeting Li ... Zixuan Chen
18 Apr 2022
The computer journal | VOL. 66

Generalization with Precision: The Role of Negative Teaching Examples in the Instruction of Generalized Grocery Item Selection
Robert H Horner ... Ginevera Ralph
Journal of the Association for Persons with Severe Handicaps | VOL. 11
Robert H Horner, et. al.Robert H Horner ... Ginevera Ralph
01 Dec 1986
Journal of the Association for Persons with Severe Handicaps | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Boundary-Information-Based Oversampling Approach to Improve Learning Performance for Imbalanced Datasets.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy