Abstract

In recent years, most researchers focused on the classification problems of imbalanced data sets, and these problems are widely distributed in industrial production and medical research fields. For these highly imbalanced data sets, the ensemble method based on over-sampling is one of the most competitive techniques in the present research. However, the incorrect sampling strategy easily affected the model performance, which increased the training complexity and caused an over-fitting problem. This article proposed an equilibrium ensemble method (DCI-ISSA) with two novel techniques to conquer these shortcomings. Firstly, this paper raised an over-sampling approach (Data Center Interpolation DCI) to offer a counterbalanced data set for the single learner, which can prevent the base learners from the impact of class imbalance. Additionally, we provided a parameter optimization method for Random Forest (RF), which used the Improved Sparrow Search Algorithm (ISSA) to find the optimal parameters for different imbalanced data sets dynamically. These parameters can improve the classification performance of base classifiers and adjust to all kinds of lopsided data sets with distinct sizes. Experimental results showed that the DCI-ISSA-RF model outperforms other famous approaches for the imbalanced data sets with various dimensions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call