Laplace Input and Output Perturbation for Differentially Private Principal Components Analysis

Yahong Xu,Shuangjie Bai,Geng Yang

doi:10.1155/2019/9169802

Abstract

With the widespread application of big data, privacy-preserving data analysis has become a topic of increasing significance. The current research studies mainly focus on privacy-preserving classification and regression. However, principal component analysis (PCA) is also an effective data analysis method which can be used to reduce the data dimensionality, commonly used in data processing, machine learning, and data mining. In order to implement approximate PCA while preserving data privacy, we apply the Laplace mechanism to propose two differential privacy principal component analysis algorithms: Laplace input perturbation (LIP) and Laplace output perturbation (LOP). We evaluate the performance of LIP and LOP in terms of noise magnitude and approximation error theoretically and experimentally. In addition, we explore the variation of performance of the two algorithms with different parameters such as number of samples, target dimension, and privacy parameter. Theoretical and experimental results show that algorithm LIP adds less noise and has lower approximation error than LOP. To verify the effectiveness of algorithm LIP, we compare our LIP with other algorithms. The experimental results show that algorithm LIP can provide strong privacy guarantee and good data utility.

Highlights

In many modern information systems, the amount of data is very large
We propose two algorithms Laplace input perturbation (LIP) and Laplace output perturbation (LOP) for differential privacy principal component analysis
We compare the performance of LIP and LOP in terms of noise magnitude and approximation error via theoretical analysis. en we conduct many experiments to verify the performance of two algorithms on five data sets

Summary

Introduction

In many modern information systems, the amount of data is very large. Massive data increase the di culty of data analysis and processing. Principal component analysis (PCA) is a standard data analysis method, which can be used to reduce the data dimensionality. More speci cally, it projects the original high-dimensional data to the space of principal components composed by the eigenvectors of the covariance matrix of the data to get low-dimensional data, which can represent most of information of the original data. PCA simpli es the data, making data easier to use while saving on the computational complexity of the algorithm. Face recognition is much faster when rst projecting the data into lower dimension

Methods

Results

Conclusion