Abstract

Black-box machine learning models are used in an increasing number of high-stakes domains, and this creates a growing need for Explainable AI (XAI). However, the use of XAI in machine learning introduces privacy risks, which currently remain largely unnoticed. Therefore, we explore the possibility of an explanation linkage attack , which can occur when deploying instance-based strategies to find counterfactual explanations. To counter such an attack, we propose k -anonymous counterfactual explanations and introduce pureness as a metric to evaluate the validity of these k -anonymous counterfactual explanations. Our results show that making the explanations, rather than the whole dataset, k -anonymous, is beneficial for the quality of the explanations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call