Abstract

This article presents a general sampling method, called granular-ball sampling (GBS), for classification problems by introducing the idea of granular computing. The GBS method uses some adaptively generated hyperballs to cover the data space, and the points on the hyperballs constitute the sampled data. GBS is the first sampling method that not only reduces the data size but also improves the data quality in noisy label classification. In addition, because the GBS method can be used to exactly describe the boundary, it can obtain almost the same classification accuracy as the results on the original datasets, and it can obtain an obviously higher classification accuracy than random sampling. Therefore, for the data reduction classification task, GBS is a general method that is not especially restricted by any specific classifier or dataset. Moreover, the GBS can be effectively used as an undersampling method for imbalanced classification. It has a time complexity that is close to O( N ), so it can accelerate most classifiers. These advantages make GBS powerful for improving the performance of classifiers. All codes have been released in the open source GBS library at http://www.cquptshuyinxia.com/GBS.html.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call