Abstract

The recent success of neural language models (NLMs) on the Winograd Schema Challenge has called for further investigation of the commonsense reasoning ability of these models. Previous diagnostic datasets rely on crowd-sourcing which fails to provide coherent commonsense crucial for solving WSC problems. To better evaluate NLMs, we propose a logic-based framework that focuses on high-quality commonsense knowledge. Specifically, we identify and collect formal knowledge formulas verified by theorem provers and translate such formulas into natural language sentences. Based on these true knowledge sentences, adversarial false ones are generated. We propose a new dataset named WinoLogic with these sentences. Given a problem in WinoLogic, NLMs need to decide whether the plausible knowledge sentences could correctly solve the corresponding WSC problems in a zero-shot setting. We also ask human annotators to validate WinoLogic to ensure it is human-agreeable. Experiments show that NLMs still struggle to comprehend commonsense knowledge as humans do, indicating that their reasoning ability could have been overestimated.

Highlights

  • WINOLOGIC is guaranteed by both the formal ver- ically, the RoBERTa models have over 90% acification in first-order logic (FOL) and human validation

  • Since sophisticated automatic translation is not possible yet, we manually provide the formalization of WSC problems and the commonsense knowledge through knowledge engineering, relying on experts that are fluent in FOL

  • We manually provide a pair of true and false knowledge sentences based on the verified logical formulas for each WSC problem, but for some problems, more than one knowledge formula is considered

Read more

Summary

Knowledge Sentences with Variables

To better evaluate NLM’s ability to understand commonsense, we transform the logical knowledge formulas into natural language sentences. For each WSC problem, we pick the essential commonsense knowledge formulas and translate them into natural language sentences. The formula P oss(lif t(x, y), s) ≡ Strong(x, s) translates to the knowledge sentence “When person X is about to lift person Y up, if X is not strong enough, it wouldn’t be possible for X to lift Y”. We adhere to rules in translation to preserve coherence: Swapped or Replaced. We obtain a total of 562 knowledge sentences, half of which are true, and the other half false. We denote this set of knowledge sentences as the variable set

Grounded and Natural Knowledge Sentences
WINOLOGIC and Validation
Baseline 1
Baseline 2
Findings
Baseline 3
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call