Abstract

Expectation-based theories of sentence processing posit that processing difficulty is determined by predictability in context. While predictability quantified via surprisal has gained empirical support, this representation-agnostic measure leaves open the question of how to best approximate the human comprehender’s latent probability model. This work presents an incremental left-corner parser that incorporates information about both propositional content and syntactic categories into a single probability model. This parser can be trained to make parsing decisions conditioning on only one source of information, thus allowing a clean ablation of the relative contribution of propositional content and syntactic category information. Regression analyses show that surprisal estimates calculated from the full parser make a significant contribution to predicting self-paced reading times over those from the parser without syntactic category information, as well as a significant contribution to predicting eye-gaze durations over those from the parser without propositional content information. Taken together, these results suggest a role for propositional content and syntactic category information in incremental sentence processing.

Highlights

  • Cholinguistic experiments have demonstrated that the propositional content of utterances tends to be retained in memory, whereas the exact surface form and syntactic structure are forgotten (Bransford and Franks, 1971; Jarvella, 1971)

  • Making a single lexical attachment decision and a Unlike these models, our approach seeks to incorporate propositional content by augmenting a generative and incremental parser to build an ongoing representation of predicate context vectors, which is based on a categorial grammar formalism that captures both local and non-local predicatesingle grammatical attachment decision for each input word

  • S(wt) d=ef − log P(wt qt | w1..t−1) (1). These conditional probabilities can in turn be defined recursively using a transition model: mation into the processing model significantly improves fit to self-paced reading times and eye-gaze durations over corresponding ablated models, suggesting their role in online sentence processing

Read more

Summary

Isolating Content and Category Contributions

In order to examine the contribution of propositional content on the content-sensitive processing model, the model is modified to allow it to be trained to make lexical and grammatical decisions without conditioning on the predicate context vectors,. To examine the contribution of syntactic category information on the content-sensitive processing model, the model is modified to allow it to be trained to make decisions without conditioning on the syntactic category labels: MAX(. The left-corner parser of van Schijndel et al (2013) was trained on the same generalized categorial grammar reannotation of sections 02 to 21 of the WSJ corpus, using four iterations of the split-merge-smooth algorithm (Petrov et al, 2006). Both parsers used beam search decoding with a beam width of 5,000 to return the most likely sequence of parsing decisions. These two ablated models will respectively be referred to as the content- and category-ablated models in the following experiments

Experiment 1
In-domain Linguistic Accuracy
Cross-Domain Linguistic Accuracy
Experiment 2
Response Data
Likelihood Ratio Testing
Results
Experiment 4
Procedures
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call