The interpretability requirement is one of the largest obstacles when deploying machine learning models in various practical fields. Methods of eXplainable Artificial Intelligence (XAI) address those issues. However, the growing number of different solutions in this field creates a demand to assess the quality of explanations and compare them. In recent years, several attempts have been made to consolidate scattered XAI quality assessment methods into a single benchmark. Those attempts usually suffered from a focus on feature importance only, a lack of customization, and the absence of an evaluation framework. In this work, the eXplainable Artificial Intelligence Benchmark (XAIB) is proposed. Compared to existing benchmarks, XAIB is more universal, extensible, and has a complete evaluation ontology in the form of the Co-12 Framework. Due to its special modular design, it is easy to add new datasets, models, explainers, and quality metrics. Furthermore, an additional abstraction layer built with an inversion of control principle makes them easier to use. The benchmark will contribute to artificial intelligence research by providing a platform for evaluation experiments and, at the same time, will contribute to engineering by providing a way to compare explainers using custom datasets and machine learning models, which brings evaluation closer to practice.
Read full abstract