Abstract
This paper describes the submission of the team KonTra to the CMCL 2021 Shared Task on eye-tracking prediction. Our system combines the embeddings extracted from a fine-tuned BERT model with surface, linguistic and behavioral features, resulting in an average mean absolute error of 4.22 across all 5 eye-tracking measures. We show that word length and features representing the expectedness of a word are consistently the strongest predictors across all 5 eye-tracking measures.
Highlights
Surface Features Given the common findingThe corpora ZuCo 1.0 and ZuCo 2.0 by Hollenstein et al (2018, 2019) contain eye-tracking data collected in a series of reading tasks on English materials
2) Among the SLB features, we show that word length and linguistic features representing word expectedness consistently show the highest weight in predicting all of the 5 measures
BERT Features Given the success of current language models for various NLP tasks, we investigate their expressivity for human-centered tasks such as eye-tracking: each word is mapped to two types of contextualized embeddings
Summary
The corpora ZuCo 1.0 and ZuCo 2.0 by Hollenstein et al (2018, 2019) contain eye-tracking data collected in a series of reading tasks on English materials. We show that training solely on SLB features pro- tence (similaritywm,w1...m−1) To compute these vides better results than training solely on word similarity measures, we use the BERT (base) BERT Features Given the success of current language models for various NLP tasks, we investigate their expressivity for human-centered tasks such as eye-tracking: each word is mapped to two types of contextualized embeddings. The BERT base model is fine-tuned separately 5 times, one for each of the eye-tracking measures to be predicted Based on these fine-tuned models, we extract the embedding of each word as a fixed feature vector to be used for further experimentation. Measure Feature Name nFix word length (0.81), frequency score (0.05), word length-sentence length ratio (0.01), similaritywm,wm−1 (0.01), surprisal score (0.01), similaritywm,w1...m−1 (0.01). Word length (0.84), similaritywm,wm−1 (0.04), frequency score (0.03), similaritywm,w1...m−1 (0.02)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.