Revisiting the Evaluation of Theory of Mind through Question Answering

Matthew Le,Y-Lan Boureau,Maximilian Nickel

doi:10.18653/v1/d19-1598

Abstract

Theory of mind, i.e., the ability to reason about intents and beliefs of agents is an important task in artificial intelligence and central to resolving ambiguous references in natural language dialogue. In this work, we revisit the evaluation of theory of mind through question answering. We show that current evaluation methods are flawed and that existing benchmark tasks can be solved without theory of mind due to dataset biases. Based on prior work, we propose an improved evaluation protocol and dataset in which we explicitly control for data regularities via a careful examination of the answer space. We show that state-of-the-art methods which are successful on existing benchmarks fail to solve theory-of-mind tasks in our proposed approach.

Highlights

Humans interact and communicate with other people in a highly efficient way, as described for instance as Grice’s cooperative principle (Grice et al, 1975)
The first question tests the ability of the child to infer the correct mental state of Sally, i.e., that she has the false belief of the marble being in the basket
Theory of mind is an important component of intelligent systems which interact with humans

Summary

Introduction

Humans interact and communicate with other people in a highly efficient way, as described for instance as Grice’s cooperative principle (Grice et al, 1975). The key insight of Grant et al (2017) was to cast them as question answering tasks, where a system is given a story and has to answer questions about the beliefs of agents in that story This allows to adapt the bAbi benchmarking protocol (Weston et al, 2016) to evaluate theory of mind capabilities of modern neural network architectures: stories are automatically generated so that a suitably large number of examples can be provided for training. We show that stateof-the-art memory-augmented models – which are successful on existing benchmarks – fail to solve theory-of-mind tasks in our improved approach

Theory of Mind Benchmarks

Evaluating Theory of Mind Evaluation

Experimental Evaluation

Findings

Discussion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Revisiting the Evaluation of Theory of Mind through Question Answering

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2019
Citations: 23	License type: cc-by

Similar Papers

Factoid Question Answering with Distant Supervision.
Hongzhi Zhang ... Tinglei Huang
Entropy | VOL. 20
Hongzhi Zhang, et. al.Hongzhi Zhang ... Tinglei Huang
05 Jun 2018
Entropy | VOL. 20

Image Semantic Communications: An Extended Rate-Distortion Theory Based Scheme
Wanjie Tong ... Yang Yang
-
Wanjie Tong, et. al.Wanjie Tong ... Yang Yang
04 Dec 2022
04 Dec 2022

Toward the novel AI tasks in infection biology.
Artur Yakimovich ... Michael J Imperiale
mSphere | VOL. 9
Artur Yakimovich, et. al.Artur Yakimovich ... Michael J Imperiale
09 Feb 2024
mSphere | VOL. 9

Compression and Transmission of Big AI Model Based on Deep Learning
Zhengping Lin ... Jiahao Shi
ICST Transactions on Scalable Information Systems | VOL. -
Zhengping Lin, et. al.Zhengping Lin ... Jiahao Shi
11 Dec 2023
ICST Transactions on Scalable Information Systems | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Revisiting the Evaluation of Theory of Mind through Question Answering

Abstract

Highlights

Summary

Talk to us

Similar Papers