Abstract

AbstractVarious industries and fields have utilized machine learning models, particularly those that demand a significant degree of accountability and transparency. With the introduction of the General Data Protection Regulation (GDPR), it has become imperative for machine learning model predictions to be both plausible and verifiable. One approach to explaining these predictions involves assigning an importance score to each input element. Another category aims to quantify the importance of human-understandable concepts to explain global and local model behaviours. The way concepts are constructed in such concept-based explanation techniques lacks inherent interpretability. Additionally, the magnitude and diversity of the discovered concepts make it difficult for machine learning practitioners to comprehend and make sense of the concept space. To this end, we introduce ConceptGlassbox, a novel local explanation framework that seeks to learn high-level transparent concept definitions. Our approach leverages human knowledge and feedback to facilitate the acquisition of concepts with minimal human labelling effort. The ConceptGlassbox learns concepts consistent with the user’s understanding of a concept’s meaning. It then dissects the evidence for the prediction by identifying the key concepts the black-box model uses to arrive at its decision regarding the instance being explained. Additionally, ConceptGlassbox produces counterfactual explanations, proposing the smallest changes to the instance’s concept-based explanation that would result in a counterfactual decision as specified by the user. Our systematic experiments confirm that ConceptGlassbox successfully discovers relevant and comprehensible concepts that are important for neural network predictions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call