Abstract

Data mining technology can be used to dig out potential and valuable information from massive data, and support vector machine (SVM) is one of the most widely used and most efficient methods in the field of data mining classification. However, the training set data often contains sensitive attributes, and the traditional training method of SVM reveals the individual privacy information. In view of the low prediction accuracy and poor versatility of the existing SVM classifiers with privacy protection, this paper proposed a new SVM training method for differential privacy protection. The algorithm first solved the dual problem of SVM by using SMO method and the difference $E_{i}$ between the estimated value and the real value for each support vector was recorded. Then the ratio of the $E_{i}$ of each support vector to the sum of the $E_{i}$ of all the support vectors was calculated. Next, different levels of Laplace random noise were added to the corresponding dual variables $\alpha _{i}$ of each support vector to be released, according to the ratio of each support vector. According to the principle of differential privacy protection, the algorithm meets $\epsilon $ -differential privacy which can be used to effectively protect individual privacy. Experimental results on real datasets showed that the algorithm proposed in this paper could be used for classification prediction under a reasonable privacy budget.

Highlights

  • With the rapid development and wide application of the Internet technology, massive application data are accumulating at an ‘‘explosive’’ speed, and data mining technology can be used to find and extract the potential, regular and understandable patterns or knowledge in these massive data, and provide feedback and guidance for business and human life [1]

  • Differential Privacy (DP) was a privacy definition [9] proposed by Dwork in 2006 for the privacy disclosure of statistical database, and compared with traditional privacy protection models, the differential privacy model is defined on a solid mathematical basis and can be used to control the level of privacy protection

  • DIFFERENTIAL PRIVACY Support Vector Machine (SVM) CLASSIFIER BASED ON DUAL VARIABLE PERTURBATION Aiming at overcoming the shortcomings of the existing SVM methods based on privacy protection, such as low accuracy and overly strong restriction on the objective function, a differential privacy SVM based on dual variable perturbation (DPSVMDVP) was proposed

Read more

Summary

INTRODUCTION

With the rapid development and wide application of the Internet technology, massive application data are accumulating at an ‘‘explosive’’ speed, and data mining technology can be used to find and extract the potential, regular and understandable patterns or knowledge in these massive data, and provide feedback and guidance for business and human life [1]. Traditional privacy protection technologies include k-anonymity [4], l-diversity [5], m-invariance [6], t-closeness [7], and so on, which had been applied to various data mining methods by many scholars at home and abroad These methods are based on the premise that the attackers had no background knowledge, which could not provide adequate security [8]. The support vector machine classification algorithm based on differential privacy protection is a valuable research topic, of which the research goal is to ensure the high classification accuracy of support vector machines and to protect the individual privacy information of the training sets.

RELATED WORK
SUPPORT VECTOR MACHINE
DIFFERENTIAL PRIVACY SVM CLASSIFIER BASED ON DUAL VARIABLE PERTURBATION
OUTLINE
GLOBAL SENSIBILITY OF DPSVMDVP ALGORITHM
DESIGN OF DPSVMDVP ALGORITHM
ALGORITHM PRIVACY ANALYSIS
EXPERIMENTAL RESULTS AND ANALYSIS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call