Abstract

Support vector machines (SVMs) are among the most robust and accurate methods in all well-known machine learning algorithms, especially for classification. The SVMs train a classification model by solving an optimization problem to decide which instances in the training datasets are the support vectors (SVs). However, SVs are intact instances taken from the training datasets and directly releasing the classification model of the SVMs will carry significant risk to the privacy of individuals, when the training datasets contain sensitive information. In this paper, we study the problem of how to release the classification model of kernel SVMs while preventing privacy leakage of the SVs and satisfying the requirement of privacy protection. We propose a new differentially private algorithm for the kernel SVMs based on the exponential and Laplace hybrid mechanism named DPKSVMEL. The DPKSVMEL algorithm has two major advantages compared with existing private SVM algorithms. One is that it protects the privacy of the SVs by postprocessing and the training process of the non-private kernel SVMs does not change. Another is that the scoring function values are directly derived from the symmetric kernel matrix generated during the training process and does not require additional storage space and complex sensitivity analysis. In the DPKSVMEL algorithm, we define a similarity parameter to denote the correlation or distance between the non-SVs and every SV. And then, every non-SV is divided into a group with one of the SVs according to the maximal value of the similarity. Under some certain similarity parameter value, we replace every SV with a mean value of the top-k randomly selected most similar non-SVs within the group by the exponential mechanism if the number of non-SVs is greater than k. Otherwise, we add random noise to the SVs by the Laplace mechanism. We theoretically prove that the DPKSVMEL algorithm satisfies differential privacy. The extensive experiments show the effectiveness of the DPKSVMEL algorithm for kernel SVMs on real datasets; meanwhile, it achieves higher classification accuracy than existing private SVM algorithms.

Highlights

  • In recent years, with the rapid development of computing devices in the collecting, storing, and processing capabilities, data sharing and analyzing are becoming easier and more practical [1]

  • (iv) We theoretically prove that the DPKSVMEL algorithm satisfies Differential privacy (DP). e extensive experiments show the effectiveness of the DPKSVMEL algorithm for kernel support vector machine (SVM) on real datasets; it achieves higher classification accuracy than existing private SVMs algorithms

  • E privacy of support vectors (SVs) is protected by postprocessing the nonprivate classification model with DP to prevent privacy leakage of the SVs. e DPKSVMEL algorithm is proved to satisfy DP theoretically and overcomes some shortcomings in the existing private SVM algorithm

Read more

Summary

Introduction

With the rapid development of computing devices in the collecting, storing, and processing capabilities, data sharing and analyzing are becoming easier and more practical [1]. Data mining and machine learning techniques have been gaining a great deal of attention for analyzing useful information. E classification algorithm, as one of the important data mining tasks, trains a classification model from labeled training datasets to classify unknown data in the future [2]. E support vector machine (SVM) [3, 4] is one of the most widely used machine learning algorithms for classification in practice [5]. Like most other classification algorithms, SVMs have privacy issues when the training datasets contain sensitive information such as user behavior records or electronic health records. In SVMs, support vectors (SVs) are an important component of the classification model and they are intact instances taken from the training datasets. Releasing the classification model of SVMs will carry significant risk to the privacy of individuals, especially for kernel SVMs [2]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call