Abstract

Lexical Simplification is the process of replacing complex words in a given sentence with simpler alternatives of equivalent meaning. This task has wide applicability both as an assistive technology for readers with cognitive impairments or disabilities, such as Dyslexia and Aphasia, and as a pre-processing tool for other Natural Language Processing tasks, such as machine translation and summarisation. The problem is commonly framed as a pipeline of four steps: the identification of complex words, the generation of substitution candidates, the selection of those candidates that fit the context, and the ranking of the selected substitutes according to their simplicity. In this survey we review the literature for each step in this typical Lexical Simplification pipeline and provide a benchmarking of existing approaches for these steps on publicly available datasets. We also provide pointers for datasets and resources available for the task.

Highlights

  • In the context of Natural Language Processing (NLP), the task of Lexical Simplification (LS) aims to perform Text Simplification (TS) by focusing on lexical information

  • Based on the results presented, it is quite clear that supervised approaches that use highly tuned modern machine learning techniques tend to be more effective than threshold and lexicon-based alternatives when there are labeled datasets available

  • It is important to mention that, since the candidates being filtered come from a super-set containing candidates from all Substitution Generation (SG) strategies evaluated in section 4.3, it is quite challenging for the selectors to effectively discard all spurious candidates, which leads to rather low F-scores across all approaches

Read more

Summary

Introduction

In the context of Natural Language Processing (NLP), the task of Lexical Simplification (LS) aims to perform Text Simplification (TS) by focusing on lexical information It can be formally described as the task of replacing words in a given sentence in order to make it simple, without applying any modifications to its syntactic structure. The work by Hirsh and Nation (1992) and Nation (2001) show that English learners need to be familiar with 95% of a text’s vocabulary in order to achieve basic comprehension, and familiar with 98%of a text’s vocabulary for leisure They observe that those who are familiar with the vocabulary of a text can often understand the entirety of its meaning even if the grammatical constructs used are confusing to them. The steps are the following: 1. Complex Word Identification: Task of deciding which words of a given sentence may not be understood by a given target audience and must be simplified

Substitution Ranking
Datasets and Resources
Complex Word Identification
Threshold-Based
Lexicon-Based
Implicit Complex Word Identification
Machine Learning-Assisted
Benchmarking
Datasets and Metrics
Results
Discussion
Substitution Generation
Linguistic Database Querying
Automatic Substitution Generation
Substitution Selection
Explicit Sense Labelling
Implicit Sense Labelling
Part-of-Speech Tag Filtering
Semantic Similarity Filtering
Frequency-Based
Simplicity Measures
Full LS Pipeline Evaluation
Dataset and Metrics
Discussion and Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.