Abstract

Do state-of-the-art natural language understanding models care about word order - one of the most important characteristics of a sequence? Not always! We found 75% to 90% of the correct predictions of BERT-based classifiers, trained on many GLUE tasks, remain constant after input words are randomly shuffled. Despite BERT embeddings are famously contextual, the contribution of each individual word to downstream tasks is almost unchanged even after the word's context is shuffled. BERT-based models are able to exploit superficial cues (e.g. the sentiment of keywords in sentiment analysis; or the word-wise similarity between sequence-pair inputs in natural language inference) to make correct decisions when tokens are arranged in random orders. Encouraging classifiers to capture word order information improves the performance on most GLUE tasks, SQuAD 2.0 and out-of-samples. Our work suggests that many GLUE tasks are not challenging machines to understand the meaning of a sentence.

Highlights

  • Machine learning (ML) models recently achieved excellent performance on state-of-the-art benchmarks for evaluating natural language understanding (NLU)

  • We chose to answer this question for SST-2 and QNLI because they have the lowest Word-Order Sensitivity score (WOS) scores across all 6 GLUE tasks tested (Table 2) and they are representative of single-sentence and sequencepair tasks, respectively

  • After the second finetuning on downstream tasks, we observed that all models were substantially more sensitive to word order, compared to the baseline models

Read more

Summary

Introduction

Machine learning (ML) models recently achieved excellent performance on state-of-the-art benchmarks for evaluating natural language understanding (NLU). In July 2019, RoBERTa (Liu et al, 2019) was the first to surpass a human baseline on GLUE (Wang et al, 2019). 13 more methods have outperformed humans on the GLUE leaderboard. At least 8 out of the 14 solutions are based on BERT (Devlin et al, 2019)—a transformer architecture that learns representations via a bidirectional encoder. Given their superhuman GLUE-scores, how do BERT-based models solve NLU tasks? We shed light into these important questions by examining model sensitivity to the order of words. Word order is one of the key characteristics of a

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.