A Parallel Grid Optimization of SVM Hyperparameter for Big Data Classification using Spark Radoop

Ahmed Hussein Ali,Mahmood Zaki Abdullah

doi:10.33640/2405-609x.1270

Ahmed Hussein Ali, Mahmood Zaki Abdullah

Open Access

https://doi.org/10.33640/2405-609x.1270

Copy DOI

Journal: Karbala International Journal of Modern Science	Publication Date: Mar 26, 2020
Citations: 6	License type: cc-by-nc-nd

Abstract

The big data phenomenon is currently a challenge to the process of relevant knowledge extraction using classical machine learning technique. This is due to the need for efficient data reduction and new fast-distributed machine learning algorithms for such process on big data. The extensive application of SVM demands efficient methods of constructing the classifier to be suitable for big data and high classification capability. In reality, the efficiency of SVM relies on the efficient derivation of the optimal feature subset and the algorithmic parameters. The grid search optimization method usually presents global optima and high learning accuracy compared to PSO and GA, but its larger computation takes much time. The grid search is more attractive because it can simultaneously take part in the learning of every SVM since they do not rely on each other. A novel parallel implementation of grid optimization using Spark Radoop is proposed in this paper to minimize the great computation load and make it suitable for big data processing issues. A major contribution of this study is a significant reduction in the algorithmic computational time when compared to the serial version of gridSVM, as well as the high classification accuracy compared to the other parallel optimization techniques.

Full Text