Abstract

Recent growth in domain specific applications of machine learning can be attributed to availability of realistic public datasets. Real world datasets may always contain sensitive information about the users, which makes it hard to share freely with other stake holders, and researchers due to regulatory and compliance requirements. Synthesising datasets from real data by leveraging generative techniques is gaining popularity. However, the privacy analysis of these dataset is still a open research. In this work, we fill this gap by investigating the privacy issues of the generated data sets from attacker and auditor point of view. We propose instance level Privacy Score (PS) for each synthetic sample by measuring the memorisation coefficient \(\boldsymbol{\alpha _{m}}\) per sample. Leveraging, PS we empirically show that accuracy of membership inference attacks on synthetic data drop significantly. PS is a model agnostic, post training measure, which helps data sharer with guidance about the privacy properties of a given sample but also helps third party data auditors to run privacy checks without sharing model internals. We tested our method on two real world data sets and show that attack accuracy reduced by PS based filtering.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.