Archer at SemEval-2021 Task 1: Contextualising Lexical Complexity

Irene Russo

doi:10.18653/v1/2021.semeval-1.90

Abstract

Evaluating the complexity of a target word in a sentential context is the aim of the Lexical Complexity Prediction task at SemEval-2021. This paper presents the system created to assess single words lexical complexity, combining linguistic and psycholinguistic variables in a set of experiments involving random forest and XGboost regressors. Beyond encoding out-of-context information about the lemma, we implemented features based on pre-trained language models to model the target word’s in-context complexity.

Highlights

1 Introduction psycholinguistic variables, using a random forest regressor and an XGboost regressor
We experiment with different language models in a masked word prediction framework, taking into account the first ten most probable words occurring in that context
We introduce the system used to assess single English words lexical complexity at SemEval-2021 Lexical Complexity Prediction task (Shardlow et al, 2021)

Summary

Related works

A wide range of approaches has been used for lexical complexity prediction in past evaluation campaigns. If we frame lexical complexity as a measure strongly dependent on words’ psycholinguistic properties, we should recognize that past computational efforts for predicting word norms did not take into account the role of context (Russo, 2020; Charbonnier and Wartena, 2019) Static word embeddings such as word2vec have been used to predict values of psycholinguist norms usually assessed in experimental settings (Ljubesicet al., 2018; Rothe and Schutze, 2016). In LCP2021 lexical complexity is a continuous property, and the task consists of predicting the complexity score for each target word in context. Sub-task 1: predicting the complexity score of single words; Sentences are extracted from three domains: the Bible, the English part of the European Parliament proceedings, and a biomedical corpus composed of scientific papers. The age of acquisition of words is another variable strongly correlated with the complexity of the target words (r=0.55)

Experiments

Findings

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Archer at SemEval-2021 Task 1: Contextualising Lexical Complexity

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2021
Citations: 2	License type: cc-by

Similar Papers

Intelligibility of Highly Predictable Polish Target Words in Sentences Presented to Czech Readers
Klára Jágrová ... Tania Avgustinova
-
Klára Jágrová, et. al.Klára Jágrová ... Tania Avgustinova
01 Jan 2023
01 Jan 2023

Evidence of word class effects for word recognition in sentences
Susan L Goldman ... Thomas D Carrell
The Journal of the Acoustical Society of America | VOL. 93
Susan L Goldman, et. al.Susan L Goldman ... Thomas D Carrell
01 Apr 1993
The Journal of the Acoustical Society of America | VOL. 93

Lexical Attrition
Scott Jarvis ... Monika S Schmid
-
Scott Jarvis, et. al.Scott Jarvis ... Monika S Schmid
11 Jul 2019
11 Jul 2019

Word-Context Effects in Word Naming and Lexical Decision
Annette M B De Groot
The Quarterly Journal of Experimental Psychology Section A | VOL. 37
Annette M B De GrootAnnette M B De Groot
01 May 1985
The Quarterly Journal of Experimental Psychology Section A | VOL. 37

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Archer at SemEval-2021 Task 1: Contextualising Lexical Complexity

Abstract

Highlights

Summary

Talk to us

Similar Papers