Abstract

The paper referees to a problem of learning from class-imbalanced data. The class imbalance problem arises when the number of instances from different classes differs substantially. Instance selection aims at deciding which instances from the training set should be retained and used during the learning process. Over-sampling is an approach dedicated to duplicate minority class instances. In the paper, a hybrid approach for the imbalanced data learning using the over-sampling and instance selection techniques is proposed. Instances are selected to reduce the number of instances belonging to the majority class, while the number of instances belonging to the minority class is expanded. The process of instance selection is based on clustering, where the authors’ approach to clustering and instance selection using an agent-based population learning algorithm is applied. As a result a more balanced distribution of instances belonging to different classes is obtained and a dataset size is reduced. The proposed approach is validated experimentally using several benchmark datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call