WinoGrande

Keisuke Sakaguchi,Ronan Le Bras,Chandra Bhagavatula,Yejin Choi

doi:10.1145/3474381

Abstract

Commonsense reasoning remains a major challenge in AI, and yet, recent progresses on benchmarks may seem to suggest otherwise. In particular, the recent neural language models have reported above 90% accuracy on the Winograd Schema Challenge (WSC), a commonsense benchmark originally designed to be unsolvable for statistical models that rely simply on word associations. This raises an important question---whether these models have truly acquired robust commonsense capabilities or they rely on spurious biases in the dataset that lead to an overestimation of the true capabilities of machine commonsense. To investigate this question, we introduce WinoGrande, a large-scale dataset of 44k problems, inspired by the original WSC, but adjusted to improve both the scale and the hardness of the dataset. The key steps of the dataset construction consist of (1) large-scale crowdsourcing, followed by (2) systematic bias reduction using a novel AFLITE algorithm that generalizes human-detectable word associations to machine-detectable embedding associations. Our experiments demonstrate that state-of-the-art models achieve considerably lower accuracy (59.4%-79.1%) on WINOGRANDE compared to humans (94%), confirming that the high performance on the original WSC was inflated by spurious biases in the dataset. Furthermore, we report new state-of-the-art results on five related benchmarks with emphasis on their dual implications. On the one hand, they demonstrate the effectiveness of WINOGRANDE when used as a resource for transfer learning. On the other hand, the high performance on all these benchmarks suggests the extent to which spurious biases are prevalent in all such datasets, which motivates further research on algorithmic bias reduction.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

WinoGrande

Abstract

Talk to us

Similar Papers

More From: Communications of the ACM

Lead the way for us

Journal: Communications of the ACM	Publication Date: Aug 24, 2021
Citations: 44

Similar Papers

WinoGrande: An Adversarial Winograd Schema Challenge at Scale
Keisuke Sakaguchi ... Ronan Le Bras
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34
Keisuke Sakaguchi, et. al.Keisuke Sakaguchi ... Ronan Le Bras
03 Apr 2020
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34

Combining Knowledge Hunting and Neural Language Models to Solve the Winograd Schema Challenge
Ashok Prakash ... Arpit Sharma
-
Ashok Prakash, et. al.Ashok Prakash ... Arpit Sharma
01 Jan 2019
01 Jan 2019

Towards Solving the Winograd Schema Challenge: Model-Free, Model-Based and a Spectrum in Between
Weinan He ...
-
Weinan He, et. al.Weinan He ...
01 Jan 2020
01 Jan 2020

Distinguishing Sensitive and Insensitive Options for the Winograd Schema Challenge
Dong Li ... Ting Wang
-
Dong Li, et. al.Dong Li ... Ting Wang
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

WinoGrande

Abstract

Talk to us

Similar Papers

More From: Communications of the ACM