Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words?

Cansu Sen,Thomas Hartvigsen,Xiangnan Kong,Elke Rundensteiner,Biao Yin

doi:10.18653/v1/2020.acl-main.419

Abstract

Motivated by human attention, computational attention mechanisms have been designed to help neural networks adjust their focus on specific parts of the input data. While attention mechanisms are claimed to achieve interpretability, little is known about the actual relationships between machine and human attention. In this work, we conduct the first quantitative assessment of human versus computational attention mechanisms for the text classification task. To achieve this, we design and conduct a large-scale crowd-sourcing study to collect human attention maps that encode the parts of a text that humans focus on when conducting text classification. Based on this new resource of human attention dataset for text classification, YELP-HAT, collected on the publicly available YELP dataset, we perform a quantitative comparative analysis of machine attention maps created by deep learning models and human attention maps. Our analysis offers insights into the relationships between human versus machine attention maps along three dimensions: overlap in word selections, distribution over lexical categories, and context-dependency of sentiment polarity. Our findings open promising future research opportunities ranging from supervised attention to the design of human-centric attention-based explanations.

Highlights

Attention-based models have become the architectures of choice for a vast number of NLP tasks including, but not limited to, language modeling (Daniluk et al, 2017), machine translation (Bahdanau et al, 2015), document classification (Yang et al, 2016), and question answering (Kundu and Ng, 2018; Sukhbaatar et al, 2015)
We find that biDirectional Recurrent Neural Networks (RNN) with additive attention demonstrate strong similarities to human attention for all three metrics
We show that when used with bidirectional architectures, attention can be interpreted as human-like explanations for model predictions

Summary

Introduction

Attention-based models have become the architectures of choice for a vast number of NLP tasks including, but not limited to, language modeling (Daniluk et al, 2017), machine translation (Bahdanau et al, 2015), document classification (Yang et al, 2016), and question answering (Kundu and Ng, 2018; Sukhbaatar et al, 2015). Jain and Wallace (2019) base their work on the premise that explainable attention scores should be unique for a given prediction as well as consistent with other featureimportance measures. Jain and Wallace (2019), Wiegreffe and Pinter (2019), and Serrano and Smith (2019) proposed three distinct approaches for evaluating the explainability of attention. This prompts their conclusion that attention is not explanation. Wiegreffe and Pinter (2019) find that attention learns a meaningful relationship between input tokens and model predictions, which cannot be hacked adversarially

Objectives

Findings

Discussion

Conclusion