Abstract

The extensive applications of support vector machines (SVMs) require efficient method of constructing a SVM classifier with high classification ability. The performance of SVM crucially depends on whether optimal feature subset and parameter of SVM can be efficiently obtained. In this paper, a coarse-grained parallel genetic algorithm (CGPGA) is used to simultaneously optimize the feature subset and parameters for SVM. The distributed topology and migration policy of CGPGA can help find optimal feature subset and parameters for SVM in significantly shorter time, so as to increase the quality of solution found. In addition, a new fitness function, which combines the classification accuracy obtained from bootstrap method, the number of chosen features, and the number of support vectors, is proposed to lead the search of CGPGA to the direction of optimal generalization error. Experiment results on 12 benchmark datasets show that our proposed approach outperforms genetic algorithm (GA) based method and grid search method in terms of classification accuracy, number of chosen features, number of support vectors, and running time.

Highlights

  • The overwhelming amount of data that is currently available in any field provides great opportunities for researchers to obtain knowledge that is impossible to obtain before

  • Despite all the promising results that SVMs provided, it is still a challenge to efficiently construct a SVM classifier which can provide accurate prediction on the unseen new samples. This so-called generalization ability crucially depends on two tasks, namely, feature selection and parameter optimization [2,3,4]

  • Feature selection is used to identify a subset of available features which is most essential for classification

Read more

Summary

Introduction

The overwhelming amount of data that is currently available in any field provides great opportunities for researchers to obtain knowledge that is impossible to obtain before. The trend in recent years is to turn these two tasks into a multiobjective optimization problem so that global search algorithms, such as genetic algorithm (GA) [2, 14, 15], particle swarm optimization (PSO) [3], and ant colony optimization (ACO) [4], can be used to jointly perform these two tasks Jointly performing these two tasks results in a largely expanded solution space, and it requires strong search ability to find optimal feature subset and parameter for SVMs. Besides, given the fact that training SVM even only once needs a great deal of computations, it will be computationally infeasible to apply these global search algorithms into practical use, when the number of training samples increases.

Support Vector Machines
Parallel Genetic Algorithms
Method
Experiments
Limitations and Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call