Identifying which instances in a learning problem are difficult to be predicted by a model is important to avoid critical errors at deployment time as well as to plan how to learn an improved model (e.g., by training data cleaning or augmentation). Previous works have been mainly devoted on measuring instance hardness or developing meta-learners (e.g., assessors) to predict a base model performance based on the instances’ features while neglecting interpretability. In this paper, we propose a method to explain the performance of learned models in a problem based on the induction of meta-rules. Each meta-rule identifies a local region of instances, called Local Performance Region (LPR), where the base model has a predictable performance. The meta-rules are induced using a reduced number of attributes, in such a way that each LPR can be more easily inspected (e.g., by an attribute plot). The proposed method combines assessors, data augmentation and rule induction procedures. Initially, given a dataset of interest and a base model, we build an assessor model, which will be able to predict the base model’s performance for new instances. The assessor is trained based on the test results obtained when the base model is evaluated, thus generalizing the observed errors across instances in the dataset. In our work, we built an assessor in a case study to predict the probability of incorrect classifications of a Random Forest (RF) base model, achieving a mean absolute error of 0.05 in a hold out experiment. Once learned, the assessor is used to predict the model’s errors for new instances in an augmented dataset, covering a variety of features. Finally, meta-rules are learned to approximate the assessor’s predictions in local regions of instances. Experiments show the usefulness of the proposal by finding 18 local regions of bad RF performance, demonstrating a special case of LPRs, called Local Hard Regions (LHRs). By explaining the (in)correctness of model predictions, LPRs constitute a novel application in explainable AI, but focusing on explaining model performance, which can be adopted to different ML contexts.
Read full abstract