Abstract

Evaluating the complexity of a target word in a sentential context is the aim of the Lexical Complexity Prediction task at SemEval-2021. This paper presents the system created to assess single words lexical complexity, combining linguistic and psycholinguistic variables in a set of experiments involving random forest and XGboost regressors. Beyond encoding out-of-context information about the lemma, we implemented features based on pre-trained language models to model the target word’s in-context complexity.

Highlights

  • 1 Introduction psycholinguistic variables, using a random forest regressor and an XGboost regressor

  • We experiment with different language models in a masked word prediction framework, taking into account the first ten most probable words occurring in that context

  • We introduce the system used to assess single English words lexical complexity at SemEval-2021 Lexical Complexity Prediction task (Shardlow et al, 2021)

Read more

Summary

Related works

A wide range of approaches has been used for lexical complexity prediction in past evaluation campaigns. If we frame lexical complexity as a measure strongly dependent on words’ psycholinguistic properties, we should recognize that past computational efforts for predicting word norms did not take into account the role of context (Russo, 2020; Charbonnier and Wartena, 2019) Static word embeddings such as word2vec have been used to predict values of psycholinguist norms usually assessed in experimental settings (Ljubesicet al., 2018; Rothe and Schutze, 2016). In LCP2021 lexical complexity is a continuous property, and the task consists of predicting the complexity score for each target word in context. Sub-task 1: predicting the complexity score of single words; Sentences are extracted from three domains: the Bible, the English part of the European Parliament proceedings, and a biomedical corpus composed of scientific papers. The age of acquisition of words is another variable strongly correlated with the complexity of the target words (r=0.55)

Experiments
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.