Abstract
Counterfactual explanation (CFE) provides actionable counterexamples and enhances the interpretability of the decision boundaries in deep neural networks and thereby has gained increasing interest in recent years. An ideal CFE should provide both plausible and practical examples that can alter the decision of a classifier as a plausible CFE grounded in the real world. Motivated by this issue, we propose a CFE framework for identifying related features (CIRF) to improve the plausibility of explanations. CIRF comprises the following two steps: i) searching for the direction vectors that contain class information; ii) investigating an optimal point using a projection-point, which determines the magnitude of manipulation along the direction. Our framework utilizes related features and the property of a latent space in a generative model, thereby highlighting the importance of related features. We derive points that have many related features, and show a performance gain of more than 11% on the IM1 metric compared to points that have fewer related features. We validate the versatility of CIRF by performing experiments using various domains and datasets, and the two interchangeable steps. CIRF exhibits remarkable performance in terms of plausibility across various domains, including tabular and image datasets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.