Currently, private data leakage and nonlinear classification are two challenges encountered in big data mining. In particular, few studies focus on these issues in support vector machines (SVMs). In this paper, to effectively solve them, we propose a novel framework based on the concepts of differential privacy (DP) and kernel functions. This framework can allocate privacy budgets and add artificial noise to different SVM locations simultaneously, which makes the perturbation process freer and more delicate. In addition, under this framework, we propose three algorithms, DP SVMs that perturb the training data set, perturb the kernel function, and utilize mixed perturbation (DPSVM-TDP, DPSVM-KFP, and DPSVM-MP, respectively), all of which can realize accurate classification while ensuring that the users’ privacy is not violated. Moreover, we conduct privacy analysis on these algorithms and prove that they all satisfy ε , 0 − DP. Finally, we conduct experiments to evaluate the algorithms in terms of different aspects and compare them with the DPSVM with dual-variable perturbation (DVP) algorithm (DPSVM-DVP) to determine the optimal perturbation method. The results show that DPSVM-KFP can achieve the highest data utility and strictest privacy protection with the shortest running time.
Read full abstract