A Survey on Lexical Simplification

Gustavo H Paetzold,Lucia Specia

doi:10.1613/jair.5526

Abstract

Lexical Simplification is the process of replacing complex words in a given sentence with simpler alternatives of equivalent meaning. This task has wide applicability both as an assistive technology for readers with cognitive impairments or disabilities, such as Dyslexia and Aphasia, and as a pre-processing tool for other Natural Language Processing tasks, such as machine translation and summarisation. The problem is commonly framed as a pipeline of four steps: the identification of complex words, the generation of substitution candidates, the selection of those candidates that fit the context, and the ranking of the selected substitutes according to their simplicity. In this survey we review the literature for each step in this typical Lexical Simplification pipeline and provide a benchmarking of existing approaches for these steps on publicly available datasets. We also provide pointers for datasets and resources available for the task.

Highlights

In the context of Natural Language Processing (NLP), the task of Lexical Simplification (LS) aims to perform Text Simplification (TS) by focusing on lexical information
Based on the results presented, it is quite clear that supervised approaches that use highly tuned modern machine learning techniques tend to be more effective than threshold and lexicon-based alternatives when there are labeled datasets available
It is important to mention that, since the candidates being filtered come from a super-set containing candidates from all Substitution Generation (SG) strategies evaluated in section 4.3, it is quite challenging for the selectors to effectively discard all spurious candidates, which leads to rather low F-scores across all approaches

Summary

Introduction

In the context of Natural Language Processing (NLP), the task of Lexical Simplification (LS) aims to perform Text Simplification (TS) by focusing on lexical information It can be formally described as the task of replacing words in a given sentence in order to make it simple, without applying any modifications to its syntactic structure. The work by Hirsh and Nation (1992) and Nation (2001) show that English learners need to be familiar with 95% of a text’s vocabulary in order to achieve basic comprehension, and familiar with 98%of a text’s vocabulary for leisure They observe that those who are familiar with the vocabulary of a text can often understand the entirety of its meaning even if the grammatical constructs used are confusing to them. The steps are the following: 1. Complex Word Identification: Task of deciding which words of a given sentence may not be understood by a given target audience and must be simplified

Substitution Ranking

Datasets and Resources

Complex Word Identification

Threshold-Based

Lexicon-Based

Implicit Complex Word Identification

Machine Learning-Assisted

Benchmarking

Datasets and Metrics

Results

Discussion

Substitution Generation

Linguistic Database Querying

Automatic Substitution Generation

Substitution Selection

Explicit Sense Labelling

Implicit Sense Labelling

Part-of-Speech Tag Filtering

Semantic Similarity Filtering

Frequency-Based

Simplicity Measures

Full LS Pipeline Evaluation

Dataset and Metrics

Discussion and Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Artificial Intelligence Research	Publication Date: Nov 15, 2017
Citations: 47	License type: cc-by

R Discovery Prime

R Discovery Prime

A Survey on Lexical Simplification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Artificial Intelligence Research

Lead the way for us

Similar Papers

Malay lexical simplification model for non-native speaker
Salehah Omar ... Nooraini Yusoff
-
Salehah Omar, et. al.Salehah Omar ... Nooraini Yusoff
18 May 2022
18 May 2022

Lexical simplification benchmarks for English, Portuguese, and Spanish.
Sanja Štajner ... Kai North
Frontiers in Artificial Intelligence | VOL. 5
Sanja Štajner, et. al.Sanja Štajner ... Kai North
22 Sep 2022
Frontiers in Artificial Intelligence | VOL. 5

EASIER corpus: A lexical simplification resource for people with cognitive impairments.
Rodrigo Alarcon ... Lourdes Moreno
PloS one | VOL. 18
Rodrigo Alarcon, et. al.Rodrigo Alarcon ... Lourdes Moreno
12 Apr 2023
PloS one | VOL. 18

LSBert: Lexical Simplification Based on BERT
Jipeng Qiang ... Yang Shi
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 29
Jipeng Qiang, et. al.Jipeng Qiang ... Yang Shi
01 Jan 2020
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Survey on Lexical Simplification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Artificial Intelligence Research