Abstract
Support vector machines (SVMs) are powerful classifiers that have high computational complexity in the training phase, which can limit their applicability to large datasets. An effective approach to address this limitation is to select a small subset of the most representative training samples such that desirable results can be obtained. In this study, a novel instance selection method called border point extraction based on locality-sensitive hashing (BPLSH) is designed. BPLSH preserves instances that are near the decision boundaries and eliminates nonessential ones. The performance of BPLSH is benchmarked against four approaches on different classification problems. The experimental results indicate that BPLSH outperforms the other methods in terms of classification accuracy, preservation rate, and execution time. The source code of BPLSH can be found in https://github.com/mohaslani/BPLSH.
Highlights
Support vector machines (SVMs) are effective classifiers with a definite theoretical foundation and have been extensively used in various applications in different fields, such as data mining [42], remote sensing [32], and geoscience [40]
For an unbiased and reliable evaluation of the instance selection methods, a repeated stratified q-fold cross-validation scheme is used
A given dataset is partitioned into q exclusive folds, and each time, q-1 folds are utilized to train the SVM after an instance selection method is applied to them
Summary
Support vector machines (SVMs) are effective classifiers with a definite theoretical foundation and have been extensively used in various applications in different fields, such as data mining [42], remote sensing [32], and geoscience [40]. SVMs come with a minimal structural risk because they search for a separating hyperplane that represents the maximum margin between classes This feature makes SVMs more effective than other classifiers. Training an SVM, in which support vectors are oÀbtÁained, requires solving a quadratic programming optimization problem, which poses a computational complexity of O n3 , where n is the number of training samples This computational cost inhibits the applicability of SVMs to tasks involving large datasets, such as feature extraction from high-resolution aerial images. The instances with great potential to contribute to the classification and construction of demarcation hyperplanes are preserved These patterns, called support vector candidates, lie close to the border of classes, and they have been
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.