Abstract

The class-imbalance problem is one of the researches of machine learning and data mining. To address the class-imbalance problem, the traditional oversampling algorithm only utilizes the information of the positive instances to generate the synthetic instances with similar characteristics to the minority instances, and there is a problem that the information of the majority instances cannot be used. When the minority instances are too few and too concentrated, such methods suffer from the problem of small disjuncts, resulting in overfitting of the training data. To solve this problem, we incorporate the genetic process of three-line hybrid rice, and a new positive instances augmentation algorithm, i.e., Three-line Hybrid Positive Instance Augmentation (THPIA) is proposed. The THPIA uses the genetic process of three-line hybrid rice to mixup the features of majority-class and minority-class to construct unlabeled instances. Then, the positive instances in the pool of the positive instances are randomly selected to hybridize with the randomly selected unlabeled instances, and the enhanced seed instances of the positive instances are obtained. Finally, a distance constraint is used to prevent the augmented positive instances from generating noisy instances in the negative region. The experimental results on 20 open datasets show that THPIA can effectively utilize the information of the majority instances to enhance the minority instances. Comparing with 7 state-of-the-art methods by Friedman test and Holm’s post-hoc test, THPIA is comparable to CDSMOTE and SMOTE-LOF, and outperforms the remaining 5 state-of-the-art algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call