Abstract

The imbalanced classification problem turns out to be one of the important and challenging problems in data mining and machine learning. The performances of traditional classifiers will be severely affected by many data problems, such as class imbalanced problem, class overlap and noise. When the number of one class in the data set is larger than other classes, class imbalanced problem will inevitably occur. Therefore, many researchers are committed to solving the problem of category imbalance and improving the overall classification performances of the classifier. The Tomek-Link algorithm was only used to clean data when it was proposed. In recent years, there have been reports of combining Tomek-Link algorithm with sampling technique. The Tomek-Link sampling algorithm can effectively reduce the class overlap on data, remove the majority instances that are difficult to distinguish, and improve the algorithm classification accuracy. However, the Tomek-Links under-sampling algorithm only considers the boundary instances that are the nearest neighbors to each other globally and ignores the potential local overlapping instances. When the number of minority instances is small, the under-sampling effect is not satisfactory, and the performance improvement of the classification model is not obvious. Therefore, on the basis of Tomek-Link, a multi-granularity relabeled under-sampling algorithm (MGRU) is proposed. This algorithm fully considers the local information of the data set in the local granularity subspace, and detects the local potential overlapping instances in the data set. Then, the overlapped majority instances are eliminated according to the global relabeled index value, which effectively expands the detection range of Tomek-Links. The simulation results show that when we select the optimal global relabeled index value for under-sampling, the classification accuracy and generalization performance of the proposed under-sampling algorithm are significantly better than other baseline algorithms.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.