Abstract

When using differential privacy to publish high-dimensional data, the huge dimensionality leads to greater noise. Especially for high-dimensional binary data, it is easy to be covered by excessive noise. Most existing methods cannot address real high-dimensional data problems appropriately because they suffer from high time complexity. Therefore, in response to the problems above, we propose the differential privacy adaptive Bayesian network algorithm PrivABN to publish high-dimensional binary data. This algorithm uses a new greedy algorithm to accelerate the construction of Bayesian networks, which reduces the time complexity of the GreedyBayes algorithm from O n k C m + 1 k + 2 to O n m 4 . In addition, it uses an adaptive algorithm to adjust the structure and uses a differential privacy Exponential mechanism to preserve the privacy, so as to generate a high-quality protected Bayesian network. Moreover, we use the Bayesian network to calculate the conditional distribution with noise and generate a synthetic dataset for publication. This synthetic dataset satisfies ε -differential privacy. Lastly, we carry out experiments against three real-life high-dimensional binary datasets to evaluate the functional performance.

Highlights

  • Various data are continuously collected and stored in different information systems with the continuous development of information technology

  • Algorithm is to use the conditional distribution of the node and its parent node to generate synthetic data according to the topological order of the Bayesian network

  • According to the L1 errors and L2 errors between the generated synthetic dataset and the original dataset, PrivABN only needs a very small privacy budget to achieve the effect of NoPrivABN

Read more

Summary

Introduction

Various data are continuously collected and stored in different information systems with the continuous development of information technology. The increase of global sensitivity with attribute dimension can be avoided, and the dimensional disaster can be solved effectively (2) To reduce the total time complexity of the algorithm and enable it to process real high-dimensional data, a construction algorithm ABN is proposed by using a greedy algorithm, adaptive algorithm, and differential privacy index mechanism (3) We propose a synthetic data generation algorithm SDG by using the characteristics of binary data and the topological order of the Bayesian network This algorithm can reduce the magnitude of added noise and prevent excessive noise from covering the actual value

Related Work
Differential Privacy
The PrivABN Algorithm
Experiences
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call