Explaining Black-box Classification Models with Arguments

Leila Amgoud

doi:10.1109/ictai52525.2021.00126

Abstract

Two approaches for explaining black-box classification models have been studied: a global approach which aims at stressing when classes are predicted independently of instances, and a local approach which looks for justifying individual predictions. Besides, different types of local explanations have been studied in the recent literature, however their links to global explanations remain unclear.The present paper proposes a unified setting for global explanations and local ones. It is based on dual concepts that provide global explanations: arguments in favour of predictions and arguments against predictions. The former justify why a class is suggested by a black-box classifier and the latter state why a class is not. We investigate the properties of both types of arguments, and provide ways for generating arguments pro a class from arguments con the class and vice versa. Finally, we define various notions of local explanations from the literature by arguments pros/con, characterizing formally their relationships and differences, and also their relations with global explanations.

Full Text