Abstract

With the development of cloud computing and distributed cluster technology, the concept of big data has been expanded and extended in terms of capacity and value, and machine learning technology has also received unprecedented attention in recent years. Traditional machine learning algorithms cannot solve the problem of effective parallelization, so a parallelization support vector machine based on Spark big data platform is proposed. Firstly, the big data platform is designed with Lambda architecture, which is divided into three layers: Batch Layer, Serving Layer, and Speed Layer. Secondly, in order to improve the training efficiency of support vector machines on large-scale data, when merging two support vector machines, the “special points” other than support vectors are considered, that is, the points where the nonsupport vectors in one subset violate the training results of the other subset, and a cross-validation merging algorithm is proposed. Then, a parallelized support vector machine based on cross-validation is proposed, and the parallelization process of the support vector machine is realized on the Spark platform. Finally, experiments on different datasets verify the effectiveness and stability of the proposed method. Experimental results show that the proposed parallelized support vector machine has outstanding performance in speed-up ratio, training time, and prediction accuracy.

Highlights

  • As the mainstream part of today’s media industry, images and videos are rich in information and easy to understand, which makes them an indispensable part of life

  • Xu et al [19] put forward an incremental algorithm similar to the block algorithm, which takes the training scale tolerated by the single training algorithm as an increment and combines it with the support vector of the previous sample for training until all the training samples are processed

  • Without changing the overall architecture of Cascade SVM, this paper studies the impact of merging algorithm on the accuracy of the final model and proposes a parallelized support vector machine model based on crossvalidation

Read more

Summary

Introduction

As the mainstream part of today’s media industry, images and videos are rich in information and easy to understand, which makes them an indispensable part of life. Computer vision analysis is the key development direction of the Internet communication industry at present. Character recognition has great application value in many scenes, such as vehicle license plate detection, image-text conversion, image content translation, and image search. Because the precision of text recognition technology is not ideal, its application scenarios are relatively simple, such as content search in images [1,2,3,4,5,6]

Literature Review
Overall Architecture of Machine Learning Platform
Result processing module
Experimental Results and Analysis
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.