Abstract

With the rapid development of information technology, people benefit more and more from big data. At the same time, it becomes a great concern that how to obtain optimal outputs from big data publishing and sharing management while protecting privacy. Many researchers seek to realize differential privacy protection in massive high-dimensional datasets using the method of principal component analysis. However, these algorithms are inefficient in processing and do not take into account the different privacy protection needs of each attribute in high-dimensional datasets. To address the above problem, we design a Divided-block Sparse Matrix Transformation Differential Privacy Data Publishing Algorithm (DSMT-DP). In this algorithm, different levels of privacy budget parameters are assigned to different attributes according to the required privacy protection level of each attribute, taking into account the privacy protection needs of different levels of attributes. Meanwhile, the use of the divided-block scheme and the sparse matrix transformation scheme can improve the computational efficiency of the principal component analysis method for handling large amounts of high-dimensional sensitive data, and we demonstrate that the proposed algorithm satisfies differential privacy. Our experimental results show that the mean square error of the proposed algorithm is smaller than the traditional differential privacy algorithm with the same privacy parameters, and the computational efficiency can be improved. Further, we combine this algorithm with blockchain and propose an Efficient Privacy Data Publishing and Sharing Model based on the blockchain. Publishing and sharing private data on this model not only resist strong background knowledge attacks from adversaries outside the system but also prevent stealing and tampering of data by not-completely-honest participants inside the system.

Highlights

  • With the arrival of the era of big data and cloud computing, the data center of each city is full of all kinds of high-dimensional data

  • The data publisher processes the original data using the DSMT-DP data publishing algorithm to form the intermediate data; after that, the intermediate data is transmitted through a secure channel and recorded to the blockchain. en, the data requester initiates a request to obtain the key from the publisher, and the corresponding intermediate data can be recovered for data analysis and mining

  • Mean square error (MSE) is selected as the indicator to evaluate the impact of sparse matrix transform on data availability in the dimensionality reduction stage; we compare the performance of the algorithm proposed in this paper with other traditional classical algorithms in terms of mean square error

Read more

Summary

Introduction

With the arrival of the era of big data and cloud computing, the data center of each city is full of all kinds of high-dimensional data. Us, a principal component analysis algorithm incorporating the differential privacy perturbations can transform the original variables into low-dimensional variables that reflect the majority of information about the original variables, and adding even a small perturbation to the data matrix of a lowdimensional variable can trigger a large change in its overall variables [4]. (1) To address the problem that existing principal component analysis differential privacy algorithms do not take into account the different privacy protection needs of each attribute in high-dimensional datasets. (2) To address the problem that existing principal component analysis differential privacy algorithms are inefficient in processing high-dimensional datasets. We use a divided-block scheme and sparse matrix transformation scheme to significantly improve the computational efficiency of the principal component analysis differential privacy data publishing algorithm than the original algorithms. Publishing and sharing data on this model resist strong background knowledge attacks from adversaries outside the system and prevent stealing and tampering of data by not-completely-honest participants inside the system

Related Work
Preliminaries
Experimental Results and Analysis
Comparison of Computing Time Efficiency
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.