Unawareness detection: Discovering black-box malicious models and quantifying privacy leakage risks

Chengxiang Tan,Jiacheng Xu

doi:10.1016/j.cose.2023.103565

Abstract

Because machine learning models, especially black-box malicious models vulnerable to attribute inference attacks, are capable of generating a great deal of privacy leakage, recent work has focused on assessing these models in an attempt to prevent unexpected attribute privacy leakage. While there has been some success at model privacy risk evaluations, these traditional solutions are almost brittle in practice because they not only require white-box access to obtain model feature layer outputs but also their evaluation results are heavily influenced by the training dataset and the model structure, leading to difficulty in generalization. In this paper, we propose a novel unawareness detection mechanism for discovering black-box malicious models and quantifying potential unawareness privacy leakage risk along with machine learning models to overcome the two limitations. A new method for quantifying the privacy risk caused by a specific loss function has been proposed to mitigate the impact of the training dataset and the model structure. A new evaluation model has also been proposed that uses Matthew's correlation coefficient score as a new metric and the final output of the target model as a new input. In addition, the theoretical upper bound of the model privacy risk has also been given a mathematical formula that is positively correlated to the mutual information between the sensitive attributes and the target model outputs. Compared with traditional detection methods, our evaluation model reduces the requirement for model access and minimizes evaluation errors caused by data imbalance, and our privacy risk assessment method and theoretical upper bounds on privacy risk can be applied to a broader range of datasets and target model structures. The experimental results show that the adversary's prediction capability is affected by the distribution of datasets and the level of malicious intent of the model, which is consistent with the theoretical prediction, and the detecting method can find potential model privacy leaks in public datasets UTKFace and FairFace.

Full Text