Abstract

With the widespread application of big data, privacy-preserving data analysis has become a topic of increasing significance. The current research studies mainly focus on privacy-preserving classification and regression. However, principal component analysis (PCA) is also an effective data analysis method which can be used to reduce the data dimensionality, commonly used in data processing, machine learning, and data mining. In order to implement approximate PCA while preserving data privacy, we apply the Laplace mechanism to propose two differential privacy principal component analysis algorithms: Laplace input perturbation (LIP) and Laplace output perturbation (LOP). We evaluate the performance of LIP and LOP in terms of noise magnitude and approximation error theoretically and experimentally. In addition, we explore the variation of performance of the two algorithms with different parameters such as number of samples, target dimension, and privacy parameter. Theoretical and experimental results show that algorithm LIP adds less noise and has lower approximation error than LOP. To verify the effectiveness of algorithm LIP, we compare our LIP with other algorithms. The experimental results show that algorithm LIP can provide strong privacy guarantee and good data utility.

Highlights

  • In many modern information systems, the amount of data is very large

  • We propose two algorithms Laplace input perturbation (LIP) and Laplace output perturbation (LOP) for differential privacy principal component analysis

  • We compare the performance of LIP and LOP in terms of noise magnitude and approximation error via theoretical analysis. en we conduct many experiments to verify the performance of two algorithms on five data sets

Read more

Summary

Introduction

In many modern information systems, the amount of data is very large. Massive data increase the di culty of data analysis and processing. Principal component analysis (PCA) is a standard data analysis method, which can be used to reduce the data dimensionality. More speci cally, it projects the original high-dimensional data to the space of principal components composed by the eigenvectors of the covariance matrix of the data to get low-dimensional data, which can represent most of information of the original data. PCA simpli es the data, making data easier to use while saving on the computational complexity of the algorithm. Face recognition is much faster when rst projecting the data into lower dimension

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call