WinoLogic: A Zero-Shot Logic-based Diagnostic Dataset for Winograd Schema Challenge

Weinan He,Yongmei Liu,Canming Huang,Xiaodan Zhu

doi:10.18653/v1/2021.emnlp-main.307

Weinan He, Yongmei Liu + Show 2 more

Open Access

PDF Available

https://doi.org/10.18653/v1/2021.emnlp-main.307

Copy DOI

Export

Save

Cite

Publication Date: Jan 1, 2021
Citations: 5	License type: cc-by

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

The recent success of neural language models (NLMs) on the Winograd Schema Challenge has called for further investigation of the commonsense reasoning ability of these models. Previous diagnostic datasets rely on crowd-sourcing which fails to provide coherent commonsense crucial for solving WSC problems. To better evaluate NLMs, we propose a logic-based framework that focuses on high-quality commonsense knowledge. Specifically, we identify and collect formal knowledge formulas verified by theorem provers and translate such formulas into natural language sentences. Based on these true knowledge sentences, adversarial false ones are generated. We propose a new dataset named WinoLogic with these sentences. Given a problem in WinoLogic, NLMs need to decide whether the plausible knowledge sentences could correctly solve the corresponding WSC problems in a zero-shot setting. We also ask human annotators to validate WinoLogic to ensure it is human-agreeable. Experiments show that NLMs still struggle to comprehend commonsense knowledge as humans do, indicating that their reasoning ability could have been overestimated.

Highlights

WINOLOGIC is guaranteed by both the formal ver- ically, the RoBERTa models have over 90% acification in first-order logic (FOL) and human validation
Since sophisticated automatic translation is not possible yet, we manually provide the formalization of WSC problems and the commonsense knowledge through knowledge engineering, relying on experts that are fluent in FOL
We manually provide a pair of true and false knowledge sentences based on the verified logical formulas for each WSC problem, but for some problems, more than one knowledge formula is considered

Summary

Knowledge Sentences with Variables

To better evaluate NLM’s ability to understand commonsense, we transform the logical knowledge formulas into natural language sentences. For each WSC problem, we pick the essential commonsense knowledge formulas and translate them into natural language sentences. The formula P oss(lif t(x, y), s) ≡ Strong(x, s) translates to the knowledge sentence “When person X is about to lift person Y up, if X is not strong enough, it wouldn’t be possible for X to lift Y”. We adhere to rules in translation to preserve coherence: Swapped or Replaced. We obtain a total of 562 knowledge sentences, half of which are true, and the other half false. We denote this set of knowledge sentences as the variable set

Grounded and Natural Knowledge Sentences

WINOLOGIC and Validation

Baseline 1

Baseline 2

Findings

Baseline 3