Abstract

Linear classifiers are generally simpler and more explainable than their nonlinear variants. They can achieve satisfactory classification performance on linearly separable data, but not on nonlinear data. So, linear classifiers need extending, typically by modification of their algorithms, resulting in their nonlinear variants. In this paper we present one general method, cluster-based data relabelling (CBDR), that allows linear classifiers to work effectively on nonlinear data. CBDR partitions the data set into several non-overlapping class-specific clusters and relabels data by the clusters. A linear classifier can then be applied to the relabelled data to seek cluster-based linear decision boundaries instead of class-based decision boundaries. Extensive experimentation has demonstrated that CBDR can significantly enhance the classification performance of linear classifiers, and even outperform their nonlinear variants. Further experimentation has demonstrated that CBDR can also improve the classification performance of nonlinear classifiers. Most significant outperformance was observed on imbalanced data in both cases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call