Abstract

With the wide application of machine learning technology, more and more sensitive data were used to develop the machine learning model. Unfortunately, machine learning systems have been shown to be vulnerable to various privacy leakage attacks, such as membership inference attacks and model inversion attacks. These privacy threats break down the confidentiality of the training data; what’s more, the leakage of sensitive privacy will reduce the enthusiasm of data owners for sharing data with the machine learning system. Without enough training data, more seriously, it will hinder the application of machine learning. Therefore, it requires analyzing strategies for privacy risks to help data owners pre-assess the candidate dataset and assist data owners in implementing reasonable privacy control. However, the systematic privacy risk assessment is still absent from the data owner’s perspective.This paper investigates and analyzes machine learning privacy risks to understand the relationship between training data properties and privacy leakage. Based on this recognition, we introduce a privacy risk assessment scheme based on the clustering distance of training data. Our clustering distance-based method can reflect the privacy risk level of a different individual data record. And then, we combine both existing privacy analysis based on data features and our clustering distance-based method to investigate privacy risks systematically. Our experiments showed that our clustering distance method and other set properties are tightly related to privacy leakage. And data owners can pre-assess the privacy risks of candidate datasets before uploading or sharing them to the machine learning model. If needed, decide to re-choose the dataset to reduce privacy risks to an acceptable level.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call