Research on Parallel Support Vector Machine Based on Spark Big Data Platform

Yao Huimin

doi:10.1155/2021/7998417

Abstract

With the development of cloud computing and distributed cluster technology, the concept of big data has been expanded and extended in terms of capacity and value, and machine learning technology has also received unprecedented attention in recent years. Traditional machine learning algorithms cannot solve the problem of effective parallelization, so a parallelization support vector machine based on Spark big data platform is proposed. Firstly, the big data platform is designed with Lambda architecture, which is divided into three layers: Batch Layer, Serving Layer, and Speed Layer. Secondly, in order to improve the training efficiency of support vector machines on large-scale data, when merging two support vector machines, the “special points” other than support vectors are considered, that is, the points where the nonsupport vectors in one subset violate the training results of the other subset, and a cross-validation merging algorithm is proposed. Then, a parallelized support vector machine based on cross-validation is proposed, and the parallelization process of the support vector machine is realized on the Spark platform. Finally, experiments on different datasets verify the effectiveness and stability of the proposed method. Experimental results show that the proposed parallelized support vector machine has outstanding performance in speed-up ratio, training time, and prediction accuracy.

Highlights

As the mainstream part of today’s media industry, images and videos are rich in information and easy to understand, which makes them an indispensable part of life
Xu et al [19] put forward an incremental algorithm similar to the block algorithm, which takes the training scale tolerated by the single training algorithm as an increment and combines it with the support vector of the previous sample for training until all the training samples are processed
Without changing the overall architecture of Cascade SVM, this paper studies the impact of merging algorithm on the accuracy of the final model and proposes a parallelized support vector machine model based on crossvalidation

Summary

Introduction

As the mainstream part of today’s media industry, images and videos are rich in information and easy to understand, which makes them an indispensable part of life. Computer vision analysis is the key development direction of the Internet communication industry at present. Character recognition has great application value in many scenes, such as vehicle license plate detection, image-text conversion, image content translation, and image search. Because the precision of text recognition technology is not ideal, its application scenarios are relatively simple, such as content search in images [1,2,3,4,5,6]

Literature Review

Overall Architecture of Machine Learning Platform

Result processing module

Experimental Results and Analysis

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Programming	Publication Date: Dec 17, 2021
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Research on Parallel Support Vector Machine Based on Spark Big Data Platform

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Programming

Lead the way for us

Similar Papers

Big Data Classification: A Combined Approach Based on Parallel and Approx SVM
Walid Ksiaâ ... Fahmi Ben Rejab
-
Walid Ksiaâ, et. al.Walid Ksiaâ ... Fahmi Ben Rejab
28 May 2017
28 May 2017

Research of food safety risk assessment methods based on big data
Yongjun Ma ... Yonghao Xue
-
Yongjun Ma, et. al.Yongjun Ma ... Yonghao Xue
01 Mar 2016
01 Mar 2016

A Confident Majority Voting Strategy for Parallel and Modular Support Vector Machines
Yi-Min Wen ... Bao-Liang Lu
-
Yi-Min Wen, et. al.Yi-Min Wen ... Bao-Liang Lu
03 Jun 2007
03 Jun 2007

Parallel SVM Algorithms in Big Data Environments
Shuai Zhang
-
Shuai ZhangShuai Zhang
18 Nov 2022
18 Nov 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Research on Parallel Support Vector Machine Based on Spark Big Data Platform

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Programming