Abstract

SummaryMany real‐world application datasets such as medical diagnostics, fraud detection, biological classification, risk analysis and so forth are facing class imbalance and overlapping problems. It seriously affects the learning of the classification model on these datasets because minority instances are not visible to the learner in the overlapped region and the performance of learners is biased towards the majority. Undersampling‐based methods are the most commonly used techniques to handle the above‐mentioned problems. The major problem with these methods is excessive elimination and information loss, that is, unable to retain potential informative majority instances. We propose a novel entropy and neighborhood‐based undersampling (ENU) that removed only those majority instances from the overlapped region which are having less informativeness (entropy) score than the threshold entropy. Most of such existing methods improved sensitivity scores significantly but not in many other performance contexts. ENU first computes entropy and threshold score for majority instances and, a local density‐based improved KNN search is used to identify overlapped majority instances. To tackle the problem effectively ENU defined four improved KNN‐based procedures (ENUB, ENUT, ENUC, and ENUR) for effective undersampling. ENU outperformed in sensitivity, G‐mean, and F1‐score average ranking with reduced information loss as compared to the existing state‐of‐the‐art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call