Probability Density Machine: A New Solution of Class Imbalance Learning

Ruihan Cheng,Sen Xu,Longfei Zhang,Hualong Yu,Shang Gao,Shiqi Wu

doi:10.1155/2021/7555587

Abstract

Class imbalance learning (CIL) is an important branch of machine learning as, in general, it is difficult for classification models to learn from imbalanced data; meanwhile, skewed data distribution frequently exists in various real-world applications. In this paper, we introduce a novel solution of CIL called Probability Density Machine (PDM). First, in the context of Gaussian Naive Bayes (GNB) predictive model, we analyze the reason why imbalanced data distribution makes the performance of predictive model decline in theory and draw a conclusion regarding the impact of class imbalance that is only associated with the prior probability, but does not relate to the conditional probability of training data. Then, in such context, we show the rationality of several traditional CIL techniques. Furthermore, we indicate the drawback of combining GNB with these traditional CIL techniques. Next, profiting from the idea of K-nearest neighbors probability density estimation (KNN-PDE), we propose the PDM which is an improved GNB-based CIL algorithm. Finally, we conduct experiments on lots of class imbalance data sets, and the proposed PDM algorithm shows the promising results.

Highlights

Motivated by (4), we observe a new potential class imbalance learning (CIL) solution, i.e., neglecting prior probability and directly estimating the conditional probability belonging to each class to make decision. e solution avoids the tedious procedure of balancing prior probabilities and solves CIL problem in nature. e problem seems to get easier; it is still difficult to provide an accurate estimation for conditional probability
From a theoretical perspective, we tried to analyze the reason why class imbalance distribution hurts the performance of predictive model in context of Gaussian Naive Bayes classifier
It is deduced that the hazard of imbalanced data distribution is only associated with prior probability, but not with conditional probability density

Summary

Introduction

Learning from imbalanced data is an important and hot topic in machine learning, as it has been widely applied to diagnose and classify diseases [1, 2], detect software defects [3, 4], analyze biology and pharmacology data [5, 6], evaluate credit risk [7], predict actionable revenue change and bankruptcy [8, 9], diagnose faults in the industrial procedure [10, 11], classify soil types [12, 13], and even predict crash injury severity [14] or analyze crime linkages [15]. In the recent two decades, hundreds of CIL algorithms have been proposed to address the imbalanced classification problem [18, 19]. These CIL methods could be roughly divided into three categories: data-level [20,21,22,23,24,25,26,27], algorithmic-level [28,29,30,31,32,33,34,35], and ensemble learning [36,37,38,39,40,41,42]. As for ensemble learning, it adopts either data-level or algorithmic-level approaches to combine with Bagging or Boosting paradigm for improving the accuracy and robustness of CIL

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Programming	Publication Date: Sep 9, 2021
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Probability Density Machine: A New Solution of Class Imbalance Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Programming

Lead the way for us

Similar Papers

Resampling-Based Ensemble Methods for Online Class Imbalance Learning
Shuo Wang ... Xin Yao
IEEE Transactions on Knowledge and Data Engineering | VOL. 27
Shuo Wang, et. al.Shuo Wang ... Xin Yao
01 May 2015
IEEE Transactions on Knowledge and Data Engineering | VOL. 27

Relationships between Diversity of Classification Ensembles and Single-Class Performance Measures
Shuo Wang ... Xin Yao
IEEE Transactions on Knowledge and Data Engineering | VOL. 25
Shuo Wang, et. al. Shuo Wang ... Xin Yao
01 Jan 2013
IEEE Transactions on Knowledge and Data Engineering | VOL. 25

Instance weighted SMOTE by indirectly exploring the data distribution
Aimin Zhang ... Xibei Yang
Knowledge-Based Systems | VOL. 249
Aimin Zhang, et. al.Aimin Zhang ... Xibei Yang
04 May 2022
Knowledge-Based Systems | VOL. 249

Fuzzy Support Vector Machine With Relative Density Information for Classifying Imbalanced Data
Hualong Yu ... Haitao Zou
IEEE Transactions on Fuzzy Systems | VOL. 27
Hualong Yu, et. al.Hualong Yu ... Haitao Zou
01 Dec 2019
IEEE Transactions on Fuzzy Systems | VOL. 27

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Probability Density Machine: A New Solution of Class Imbalance Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Programming