Gradient-based Counterfactual Generation for Sparse and Diverse Counterfactual Explanations

Chan Sik Han,Keon Myung Lee

doi:10.1145/3555776.3577737

Abstract

Counterfactual generation has attracted attention as a technique that generates samples, called counterfactual explanations, which provide a guidance to modify an input instance for changing its class label in real-world applications. Generation of multiple counterfactual explanations gives people various options to change their input instance according to their preferences or capabilities. To generate multiple counterfactual explanations, this paper proposes a gradient-based method which dynamically selects some subsets of attributes of the given instance to be tweaked for diverse counterfactual searches. It also proposes a loss-based update rule for one-hot encoded categorical attributes which is used to produce feasible and effective counterfactual explanations for instances with both categorical and continuous features. We conducted some comparative experiments on the six public datasets to evaluate the performance of the proposed method. The experiment results showed that the proposed method generates valid and diverse counterfactual explanations with a smaller number of attribute value modifications compared with the existing works.

Full Text