Abstract

Classification of imbalanced data is a vastly explored issue of the last and present decade and still keeps the same importance because data are an essential term today and it becomes crucial when data are distributed into several classes. The term imbalance refers to uneven distribution of data into classes that severely affects the performance of traditional classifiers, that is, classifiers become biased toward the class having larger amount of data. The data generated from wireless sensor networks will have several imbalances. This review article is a decent analysis of imbalance issue for wireless sensor networks and other application domains, which will help the community to understand WHAT, WHY, and WHEN of imbalance in data and its remedies.

Highlights

  • One of the important challenges in data mining is handling of imbalanced data in classification.[1,2,3,4] We know that classification is an important technique of data mining, in which unknown class samples are assigned to some class based on previous knowledge from training samples.[5,6] Imbalance appears when data are unequally distributed into classes; some classes may have large quantity of data called as majority classes and some may have just few instances of data called minority classes

  • Review article and proposed ‘‘ROSE’’ based on smoothed bootstrap of resampled data Feature-based similarity is used for generating synthetic examples among minority class instances Weighted distribution of different minority class examples is used on the basis of their difficulty level of learning Mean and standard deviation are used to generate synthetic samples for the minority class Hybridization of synthetic minority oversampling technique (SMOTE), an optimization technique particle swarm optimization (PSO), and classifier to improve learning from imbalanced data Borderline examples and noise are handled using IPF filter generated during SMOTE processing

  • This review article presents a thorough review on imbalance problem

Read more

Summary

Introduction

One of the important challenges in data mining is handling of imbalanced data in classification.[1,2,3,4] We know that classification is an important technique of data mining, in which unknown class samples are assigned to some class based on previous knowledge from training samples.[5,6] Imbalance appears when data are unequally distributed into classes; some classes may have large quantity of data called as majority classes and some may have just few instances of data called minority classes. Review article and proposed ‘‘ROSE’’ based on smoothed bootstrap of resampled data Feature-based similarity is used for generating synthetic examples among minority class instances Weighted distribution of different minority class examples is used on the basis of their difficulty level of learning Mean and standard deviation are used to generate synthetic samples for the minority class Hybridization of SMOTE, an optimization technique PSO, and classifier to improve learning from imbalanced data Borderline examples and noise are handled using IPF filter generated during SMOTE processing. Fuzzy concept improves the performance of nearest neighbor classifiers by finding the membership of an instance into a class for normal or balanced data, so for imbalance issue, this could be helpful to use fuzzy membership concept with some strategy to deal with imbalance These strategies could be alteration in K or some weighing applications. An optimal fuzzy weighted nearest neighbor concept was proposed by Patel and Thakur,[59] and they have taken into consideration the advantages of both, the optimal weights and embedded fuzzy concept to achieve better classification results of imbalanced data

A Hybrid Weighted Nearest Neighbor Approach to Mine Imbalanced Data
H Han and B Mao55
Conclusion and future direction

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.