Abstract
With the wide application of machine learning technology, more and more sensitive data were used to develop the machine learning model. Unfortunately, machine learning systems have been shown to be vulnerable to various privacy leakage attacks, such as membership inference attacks and model inversion attacks. These privacy threats break down the confidentiality of the training data; what’s more, the leakage of sensitive privacy will reduce the enthusiasm of data owners for sharing data with the machine learning system. Without enough training data, more seriously, it will hinder the application of machine learning. Therefore, it requires analyzing strategies for privacy risks to help data owners pre-assess the candidate dataset and assist data owners in implementing reasonable privacy control. However, the systematic privacy risk assessment is still absent from the data owner’s perspective.This paper investigates and analyzes machine learning privacy risks to understand the relationship between training data properties and privacy leakage. Based on this recognition, we introduce a privacy risk assessment scheme based on the clustering distance of training data. Our clustering distance-based method can reflect the privacy risk level of a different individual data record. And then, we combine both existing privacy analysis based on data features and our clustering distance-based method to investigate privacy risks systematically. Our experiments showed that our clustering distance method and other set properties are tightly related to privacy leakage. And data owners can pre-assess the privacy risks of candidate datasets before uploading or sharing them to the machine learning model. If needed, decide to re-choose the dataset to reduce privacy risks to an acceptable level.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.