Abstract

There have existed many studies about the explainable artificial intelligence (XAI) that explains the logic behind the complex deep neural network called a black box. At the same time, researchers have tried to evaluate the explainability performance of various XAIs. However, most previous evaluation methods are human-centric, that is, subjective, where they rely on how much the results of explanation are similar to what people&#x2019;s decision is based on rather than what features actually affect the decision in the model. Their XAI selections are also dependent of datasets. Furthermore, they are focusing only on the output variation of a target class. On the other hand, this paper proposes a robust heatmap assisted accuracy score (HAAS) scheme over datasets that helps selecting machine-centric explanation algorithms to show what actually leads to the decision of a given classification network. The proposed method modifies the input image with the heatmap scores obtained by a given explanation algorithm and then puts the resultant heatmap assisted (HA) images into the network to estimate the accuracy change. The resultant metric (<i>HAAS</i>) is computed as a ratio of accuracies of the given network over HA and original images. The proposed evaluation scheme is verified in the image classification models of LeNet-5 for MNIST and VGG-16 for CIFAR-10, STL-10, and ILSVRC2012 over totally 11 XAI algorithms of saliency map, deconvolution, and 9 layer-wise relevance propagation (LRP) configurations. Consequently, for LRP1 and LRP3, MINST showed largest <i>HAAS</i> values of 1.0088 and 1.0079, CIFAR-10 achieved 1.1160 and 1.1254, STL-10 had 1.0906 and 1.0918, and ILSVRC2012 got 1.3207 and 1.3469. While LRP1 consists of &#x03F5;-rules for input, convolutional, and fully-connected layers, LRP3 adopts a bounded-rule for an input layer and the same &#x03F5;-rules for other layers as LRP1. The consistency of evaluation results of HAAS and AOPC has been compared by means of Kullback-Leibler divergence, ensuring that HAAS is the more robust evaluation method than AOPC independently of datasets since HAAS has much lower average divergence of 0.0251 than AOPC of 0.3048. In addition, the validity of the proposed HAAS scheme is further investigated through the inverted HA test that employs inverted HA images made up with inverted heatmap scores and estimates the accuracy degradation caused by applying them to the network. The XAI algorithms with largest <i>HAAS</i> results experience biggest accuracy degradation in the inverted HA test.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call