Abstract
In a world where artificial intelligence and data science become omnipresent, data sharing is increasingly locking horns with data-privacy concerns. Differential privacy has emerged as a rigorous framework for protecting individual privacy in a statistical database, while releasing useful statistical information about the database. The standard way to implement differential privacy is to inject a sufficient amount of noise into the data. However, in addition to other limitations of differential privacy, this process of adding noise will affect data accuracy and utility. Another approach to enable privacy in data sharing is based on the concept of synthetic data. The goal of synthetic data is to create an as-realistic-as-possible dataset, one that not only maintains the nuances of the original data, but does so without risk of exposing sensitive information. The combination of differential privacy with synthetic data has been suggested as a best-of-both-worlds solutions. In this work, we propose the first noisefree method to construct differentially synthetic data; we do this through a mechanism called private sampling. Using the Boolean cube as benchmark data model, we derive explicit bounds on accuracy and privacy of the constructed synthetic data. The key mathematical tools are hypercontractivity, duality, and empirical processes. A core ingredient of our sampling mechanism is a rigorous marginal correction method, which has the remarkable property that importance reweighting can be utilized to exactly match the marginals of the sample to the marginals of the population.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.