Abstract
Machine learning models are known to memorize the unique properties of individual data points in a training set. This memorization capability can be exploited by several types of attacks to infer information about the training data, most notably, membership inference attacks. In this paper, we propose an approach based on information leakage for guaranteeing membership privacy. Specifically, we propose to use a conditional form of the notion of maximal leakage to quantify the information leaking about individual data entries in a dataset, i.e., the entrywise information leakage. We apply our privacy analysis to the Private Aggregation of Teacher Ensembles (PATE) framework for privacy-preserving classification of sensitive data and prove that the entrywise information leakage of its aggregation mechanism is Schur-concave when the injected noise has a log-concave probability density. The Schur-concavity of this leakage implies that increased consensus among teachers in labeling a query reduces its associated privacy cost. Finally, we derive upper bounds on the entrywise information leakage when the aggregation mechanism uses Laplace distributed noise.
Highlights
I N recent years, many useful machine learning applications have emerged that require training on sensitive data
We will use the pointwise conditional maximal leakage to measure the information leaking about individual data entries in the Private Aggregation of Teacher Ensembles (PATE) framework
We have proposed an approach based on information leakage for quantifying membership privacy
Summary
I N recent years, many useful machine learning applications have emerged that require training on sensitive data. Differential privacy ensures that all datasets differing in only one entry (i.e., adjacent datasets) produce an output with similar probabilities It has several useful properties, such as satisfying data-processing inequalities and composition theorems [7]. The privacy guarantees result solely from the aggregation mechanism and are agnostic to the specific machine learning techniques used by each teacher This is because the modular structure of PATE enables us to invoke the data-processing inequality to uncouple the information leaked through the training and aggregation, and guarantee that the overall leakage is less than both. The privacy guarantees established by PATE are characterized in [14], [15] in terms of differential privacy, and results from experiments are reported These works do not analytically prove the aforementioned synergy between privacy and accuracy observed in the framework. As [14], [15] present a thorough experimental study, here we refrain from repeating the experiments but focus on giving a rigorous theoretical analysis of the framework
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Information Forensics and Security
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.