A parallel and balanced SVM algorithm on spark for data-intensive computing

Jianjiang Li,Jinliang Shi,Zhiguo Liu,Can Feng

doi:10.3233/ida-226774

Abstract

Support Vector Machine (SVM) is a machine learning with excellent classification performance, which has been widely used in various fields such as data mining, text classification, face recognition and etc. However, when data volume scales to a certain level, the computational time becomes too long and the efficiency becomes low. To address this issue, we propose a parallel balanced SVM algorithm based on Spark, named PB-SVM, which is optimized on the basis of the traditional Cascade SVM algorithm. PB-SVM contains three parts, i.e., Clustering Equal Division, Balancing Shuffle and Iteration Termination, which solves the problems of data skew of Cascade SVM and the large difference between local support vector and global support vector. We implement PB-SVM in AliCloud Spark distributed cluster with five kinds of public datasets. Our experimental results show that in the two-classification test on the dataset covtype, compared with MLlib-SVM and Cascade SVM on Spark, PB-SVM improves efficiency by 38.9% and 75.4%, and the accuracy is improved by 7.16% and 8.38%. Moreover, in the multi-classification test, compared with Cascade SVM on Spark on the dataset covtype, PB-SVM improves efficiency and accuracy by 94.8% and 18.26% respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A parallel and balanced SVM algorithm on spark for data-intensive computing

Abstract

Talk to us

Similar Papers

More From: Intelligent Data Analysis

Lead the way for us

Journal: Intelligent Data Analysis	Publication Date: Jul 20, 2023
Citations: 1

Similar Papers

A parallel SVM training algorithm on large-scale classification problems
Jian-Pei Zhang ... Zhong-Wei Li
-
Jian-Pei Zhang, et. al. Jian-Pei Zhang ... Zhong-Wei Li
01 Jan 2004
01 Jan 2004

Parallel SVM Algorithms in Big Data Environments
Shuai Zhang
-
Shuai ZhangShuai Zhang
18 Nov 2022
18 Nov 2022

An Integrated Face Tracking and Facial Expression Recognition System
Angappan Geetha ... Sengottaiyan Palanivel
Journal of Intelligent Learning Systems and Applications | VOL. 03
Angappan Geetha, et. al.Angappan Geetha ... Sengottaiyan Palanivel
01 Jan 2010
Journal of Intelligent Learning Systems and Applications | VOL. 03

High-performance Chinese multiclass traffic sign detection via coarse-to-fine cascade and parallel support vector machine detectors
Faliang Chang
Journal of Electronic Imaging | VOL. 26
Faliang ChangFaliang Chang
12 Oct 2017
Journal of Electronic Imaging | VOL. 26

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A parallel and balanced SVM algorithm on spark for data-intensive computing

Abstract

Talk to us

Similar Papers

More From: Intelligent Data Analysis