Abstract

Class imbalance is an important problem in many domains such as disease classification, network intrusion detection, fraud detection, and spam filtering. While dealing with imbalanced datasets, traditional supervised machine learning algorithms do not often provide acceptable results. Several approaches are used to handle the class imbalance problem. Of these, undersampling approaches are mostly followed by the researchers in which the number of instances in the majority class gets reduced. Selection of instances from the majority class can be considered as an optimization problem. To this end, in this paper, we present an undersampling approach based on widely-used Particle Swarm Optimization (PSO). The majority class samples are first clustered to form the initial undersampled set. The samples to be selected are then optimized using PSO to give the best model. The parameters of PSO are fine tuned using Learning Automata. Appropriate metrics suitable for class imbalance problems have been used to construct the fitness function for optimizing the undersampled training set. The proposed method has achieved 2% to 10% performance improvement over most of the contemporary methods on various datasets with imbalance ratios ranging from 5 to 130, thus showing that the method is robust and useful in practical scenarios. The code of the proposed method can be accessed via https://github.com/kkg1999/Undersampling.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call