Research on SVM environment performance of parallel computing based on large data set of machine learning

Yunlu Gong,Lianguo Jia

doi:10.1007/s11227-019-02894-7

Abstract

The support vector machine (SVM) algorithm is widely used in various fields because of its good classification effect, simplicity and practicability. However, the support vector machine calculates the support vector by quadratic programming, and the solution of quadratic programming will calculate the n-order matrix. When the amount of data is large, the calculation and storage of the n-order matrix will make the optimization speed very slow, even lead to memory overflow and interrupt operation. Using the big data computing platform Spark to improve the support vector machine algorithm can solve the above problems, but it’s not competent for multi-classification problems. Therefore, this paper starts with constructing multiple classifiers, combines the Spark framework of big data programming model and the classification characteristics of support vector machine to realize a parallel one-to-many SVM optimization algorithm based on large data sets and compares them through UCI data sets. In the experiments, the one-to-many support vector machine improved by Spark is obviously better than the one-to-many support vector machine in the single-machine environment. The simulation results show that the proposed algorithm has better performance.

Full Text