Guided Zoom: Zooming into Network Evidence to Refine Fine-Grained Model Decisions.

Sarah Adel Bargal,Vitali Petsiuk,Jianming Zhang,Vittorio Murino,Andrea Zunino,Stan Sclaroff,Kate Saenko

doi:10.1109/tpami.2021.3054303

Abstract

In state-of-the-art deep single-label classification models, the top- k (k=2,3,4, ...) accuracy is usually significantly higher than the top-1 accuracy. This is more evident in fine-grained datasets, where differences between classes are quite subtle. Exploiting the information provided in the top k predicted classes boosts the final prediction of a model. We propose Guided Zoom, a novel way in which explainability could be used to improve model performance. We do so by making sure the model has "the right reasons" for a prediction. The reason/evidence upon which a deep neural network makes a prediction is defined to be the grounding, in the pixel space, for a specific class conditional probability in the model output. Guided Zoom examines how reasonable the evidence used to make each of the top- k predictions is. Test time evidence is deemed reasonable if it is coherent with evidence used to make similar correct decisions at training time. This leads to better informed predictions. We explore a variety of grounding techniques and study their complementarity for computing evidence. We show that Guided Zoom results in an improvement of a model's classification accuracy and achieves state-of-the-art classification performance on four fine-grained classification datasets. Our code is available at https://github.com/andreazuna89/Guided-Zoom.

Full Text