Abstract

Does neural machine translation yield translations that are congenial with common sense? In this paper, we present a test suite to evaluate the commonsense reasoning capability of neural machine translation. The test suite consists of three test sets, covering lexical and contextless/contextual syntactic ambiguity that requires commonsense knowledge to resolve. We manually create 1,200 triples, each of which contain a source sentence and two contrastive translations, involving 7 different common sense types. Language models pretrained on large-scale corpora, such as BERT, GPT-2, achieve a commonsense reasoning accuracy of lower than 72% on target translations of this test suite. We conduct extensive experiments on the test suite to evaluate commonsense reasoning in neural machine translation and investigate factors that have impact on this capability. Our experiments and analyses demonstrate that neural machine translation performs poorly on commonsense reasoning of the three ambiguity types in terms of both reasoning accuracy ( 6 60.1%) and reasoning consistency (6 31%). We will release our test suite as a machine translation commonsense reasoning testbed to promote future work in this direction.

Highlights

  • Sixty years ago, the pioneering machine translation researcher and linguist Bar-Hillel published his well-known argument on the non-feasibility of general-purpose fully automatic high-quality machine translation (FAHQT) due to the inevitable requirement of world knowledge to help machine translation to infer correct translations for ambiguous words or linguistic structures (Bar-Hillel, 1960a)

  • Based on our experiments and analyses on evaluating commonsense reasoning in NMT, we find that: 1) commonsense reasoning related to lexical ambiguity and contextual syntactic ambiguity is more difficult than contextless syntactic ambiguity; 2) the

  • We conjecture that the reason for the superiority of BERT models over GPT/GPT-2 models is due to bidirectional context in BERT, which resonates with the findings of Zhou et al (2020)

Read more

Summary

Introduction

The pioneering machine translation researcher and linguist Bar-Hillel published his well-known argument on the non-feasibility of general-purpose fully automatic high-quality machine translation (FAHQT) due to the inevitable requirement of world knowledge to help machine translation to infer correct translations for ambiguous words or linguistic structures (Bar-Hillel, 1960a). Bar-Hillel doubts that a machine, even equipped with extra-linguistic knowledge, would be able to reason with such knowledge spontaneously as human translators do (Bar-Hillel, 1960a; Macklovitch, 1995). Recent results even suggest that the quality of machine-generated translations is approaching professional human translators (Wu et al, 2016; Hassan et al, 2018). A wide variety of efforts have been conducted to either examine the commonsense reasoning capability of neural models in natural language understanding, establish commonsense reasoning challenges or enhance neural models in commonsense reasoning (Zhang et al, 2018; Talmor et al, 2018; Huang et al, 2019; Sap et al, 2019b)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call