Abstract

Scaling up support vector machine (SVM) for large data sets remains one of its main challenges. One way to achieve this is to break down the problem into smaller ones using clustering techniques where local SVM models are constructed. Although this approach is considerably fast compared to the standard SVM, its performance is sometimes inferior even when a local kernel SVM is used as in k-local SVM (KSVM). This often occurs due to the overfitting of some local models when the corresponding clusters are unbalanced, i.e., most of their patterns belong to one class. To alleviate this problem for KSVM, a new supervised clustering technique is proposed to partition the data around the decision boundary into nearly balanced clusters. For a binary classification problem, this is accomplished as follows. First, one of the class regions (e.g. the more dense) is clustered into k clusters. Then, the clusters that are closest to the decision boundary are determined. Finally, these clusters are expanded to include the closest patterns from all classes. In this way, each cluster includes a reasonable number of patterns of each class that helps to mitigate the overfitting problem. A comparison of the proposed approach with KSVM, clustered SVM (CSVM), the standard SVM and two of the top ensemble classification trees, namely Random Forest and AdaBoost, for several benchmark large data sets was accomplished. The experimental results showed that the proposed approach outperforms KSVM and CSVM and competes the best model, especially when the radial basis kernel is used, for most data sets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.