Abstract

Domain-specific training typically makes NLP systems work better. We show that this extends to cognitive modeling as well by relating the states of a neural phrase-structure parser to electrophysiological measures from human participants. These measures were recorded as participants listened to a spoken recitation of the same literary text that was supplied as input to the neural parser. Given more training data, the system derives a better cognitive model — but only when the training examples come from the same textual genre. This finding is consistent with the idea that humans adapt syntactic expectations to particular genres during language comprehension (Kaan and Chun, 2018; Branigan and Pickering, 2017).

Highlights

  • Natural language processing (NLP) systems based on deep neural networks are sensitive to the amount and type of training data that they receive

  • Transfer to a different textual genre may be poor (Petrov and McDonald, 2012). This is the classic problem of domain adaptation1 which arises in many areas of NLP

  • We proceed by comparing parsing systems that are based on Recurrent Neural Network Grammars (Dyer et al, 2016; Wilcox et al, 2019, ; RNNG) and trained according to fourteen Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 5846–5852, Hong Kong, China, November 3–7, 2019. c 2019 Association for Computational Linguistics complexity metric: surprisal of hypotheses in beam (Hale, 2001; Roark et al, 2009) amount of training data: 39832,{100, 250, 500, 750}K, 1M and 1437575 sentences genre: newspaper text (Graff et al, 2005) and lexically-similar literature (Gutenberg)

Read more

Summary

Introduction

Natural language processing (NLP) systems based on deep neural networks are sensitive to the amount and type of training data that they receive. Transfer to a different textual genre may be poor (Petrov and McDonald, 2012). This is the classic problem of domain adaptation which arises in many areas of NLP. This paper revisits domain adaptation in the context of human-like parsing. With this humanlike aspect in mind, we consider models that use linguistically-plausible trees (see Frank, 2011 for a review) and operate incrementally from left to right (e.g. Steedman, 2000). We quantify the fit to human language performance using freelyavailable electrophysiological data (: EEG) that was elicited by a pre-existing literary text (Brennan and Hale, 2019)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call