Abstract

Privacy protection in data publishing is an extremely important issue that has been the focus of extensive research in recent years. However, the existing methods have a host of limitations, especially for high-dimensional data publishing. Aiming at the problem of poor availability of publishing results caused by “the curse of dimensionality” in high-dimensional data publishing, we present PPDP-PCAO (Privacy Preserving Data Publishing with Principal Component Analysis Optimization) method, which can better address the problem of the lower utility of release results because of the high noise introduced by the curse of dimensionality. PPDP-PCAO improves the Principal Component Analysis (PCA) algorithm by employing the attribute importance, and reduces the dimension of the data with the improved PCA, which reduces the time and space cost. PPDP-PCAO introduces the evaluation mechanism based on mutual-information into data release, which evaluates the data generated by setting the different quantities of principal components to determine the optimal quantities. PPDP-PCAO considers the existence of multi-sensitive attributes in high-dimensional data, while the traditional methods of allocating privacy budgets cannot satisfy the personalized privacy protection. PPDP-PCAO introduces the sensitivity preference, combines the optimal matching theory, and designs the sensitive attribute hierarchical protection strategy. Extensive experimental results on different real datasets demonstrate that PPDP-PCAO not only guarantees the privacy of published dataset, but also significantly improves the accuracy and data utility than other high-dimensional data publishing methods.

Highlights

  • INTRODUCTIONMany data collection agencies need to publish the collected raw data (such as medical data, financial data, etc.) for data analysis and data mining to generate more effective decision support from the released data

  • Many data collection agencies need to publish the collected raw data for data analysis and data mining to generate more effective decision support from the released data

  • The main contributions of this paper are summarized as follows: 1) We present a high-dimensional data releasing solution of principal component analysis optimization (PPDPPCAO) under differential privacy protection, which reduces the time and space cost of addressing data and improves the availability of the published data

Read more

Summary

INTRODUCTION

Many data collection agencies need to publish the collected raw data (such as medical data, financial data, etc.) for data analysis and data mining to generate more effective decision support from the released data. W. Li et al.: PPDP-PCAO: Efficient High-Dimensional Data Releasing Method With Differential Privacy Protection information per unit is very small. The main contributions of this paper are summarized as follows: 1) We present a high-dimensional data releasing solution of principal component analysis optimization (PPDPPCAO) under differential privacy protection, which reduces the time and space cost of addressing data and improves the availability of the published data. In the process of data dimensionality reduction, we design the personalized Laplace mechanism which ensures PPDP-PCAO to satisfy the requirements of differential privacy, and makes the protection of privacy more flexible. Are added to the attributes of different the sensitivity preference in the dataset to achieve a personalized noise-adding method, which makes the availability of the published data better. Through the evaluation mechanism based on mutual information, the optimal k is determined, so as to publish the data with better availability

FILTER ATTRIBUTES
22. Restore data set S
EXPERIMENTAL EVALUATION
EXPERIMENTAL ENVIRONMENT
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.