Abstract

Cost-sensitive support vector machine is one of the most popular tools to deal with class-imbalanced problem such as fault diagnosis. However, such data appear with a huge number of examples as well as features. Aiming at class-imbalanced problem on big data, a cost-sensitive support vector machine using randomized dual coordinate descent method (CSVM-RDCD) is proposed in this paper. The solution of concerned subproblem at each iteration is derived in closed form and the computational cost is decreased through the accelerating strategy and cheap computation. The four constrained conditions of CSVM-RDCD are derived. Experimental results illustrate that the proposed method increases recognition rates of positive class and reduces average misclassification costs on real big class-imbalanced data.

Highlights

  • The most popular strategy for the design of classification algorithms is to minimize the probability of error, assuming that all misclassifications have the same cost and classes of dataset are balanced [1,2,3,4,5,6]

  • Large-scale experimental data sets show that cost-sensitive support vector machines using randomized dual coordinate descent method run more efficiently than both parallel cost-sensitive support vector machine (PCSVM) and CSSVM; especially randomized dual coordinate descent algorithm has advantage of training time on large-scale data sets

  • Randomized dual coordinate descentmethod (RDCD) is the optimization algorithm to update the global solution which is obtained by solving an analytical solution of the suboptimal problem

Read more

Summary

Introduction

The most popular strategy for the design of classification algorithms is to minimize the probability of error, assuming that all misclassifications have the same cost and classes of dataset are balanced [1,2,3,4,5,6]. Cost-sensitive support vector machine (CSVM) [2] is one of the most popular tools to deal with class-imbalanced problem and unequal misclassification problem. CSVM usually maps training vectors into a high dimensional space via a nonlinear function. Data appear in a rich dimensional feature space; the performances are similar with/without nonlinear mapping. Dual coordinate descent methods for dual problem of CSVM are one of popular methods to deal with large-scale convex optimization problem. They do not focus on big data learning of CSVM.

Basic Theory of Cost-Sensitive Support Vector Machine
The Modified Proposed Method
Description of Cost-Sensitive Support Vector Machine
Experiments and Analysis
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.