Lexical simplification benchmarks for English, Portuguese, and Spanish.

Sanja Štajner,Daniel Ferrés,Horacio Saggion,Matthew Shardlow,Marcos Zampieri,Kai North

doi:10.3389/frai.2022.991242

Abstract

Even in highly-developed countries, as many as 15–30% of the population can only understand texts written using a basic vocabulary. Their understanding of everyday texts is limited, which prevents them from taking an active role in society and making informed decisions regarding healthcare, legal representation, or democratic choice. Lexical simplification is a natural language processing task that aims to make text understandable to everyone by replacing complex vocabulary and expressions with simpler ones, while preserving the original meaning. It has attracted considerable attention in the last 20 years, and fully automatic lexical simplification systems have been proposed for various languages. The main obstacle for the progress of the field is the absence of high-quality datasets for building and evaluating lexical simplification systems. In this study, we present a new benchmark dataset for lexical simplification in English, Spanish, and (Brazilian) Portuguese, and provide details about data selection and annotation procedures, to enable compilation of comparable datasets in other languages and domains. As the first multilingual lexical simplification dataset, where instances in all three languages were selected and annotated using comparable procedures, this is the first dataset that offers a direct comparison of lexical simplification systems for three languages. To showcase the usability of the dataset, we adapt two state-of-the-art lexical simplification systems with differing architectures (neural vs. non-neural) to all three languages (English, Spanish, and Brazilian Portuguese) and evaluate their performances on our new dataset. For a fairer comparison, we use several evaluation measures which capture varied aspects of the systems' efficacy, and discuss their strengths and weaknesses. We find that a state-of-the-art neural lexical simplification system outperforms a state-of-the-art non-neural lexical simplification system in all three languages, according to all evaluation measures. More importantly, we find that the state-of-the-art neural lexical simplification systems perform significantly better for English than for Spanish and Portuguese, thus posing a question if such an architecture can be used for successful lexical simplification in other languages, especially the low-resourced ones.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Artificial Intelligence	Publication Date: Sep 22, 2022
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

Lexical simplification benchmarks for English, Portuguese, and Spanish.

Abstract

Talk to us

Similar Papers

More From: Frontiers in Artificial Intelligence

Lead the way for us

Similar Papers

Metaheuristic Approaches to Lexical Substitution and Simplification
Sallam Abualhaija ... Karl-Heinz Zimmermann
-
Sallam Abualhaija, et. al.Sallam Abualhaija ... Karl-Heinz Zimmermann
01 Jan 2017
01 Jan 2017

SIMPLEX-PB 2.0: A Reliable Dataset for Lexical Simplification in Brazilian Portuguese
Nathan Hartmann ... Sandra Aluísio
-
Nathan Hartmann, et. al.Nathan Hartmann ... Sandra Aluísio
01 Jan 2020
01 Jan 2020

Pattern-Based Syntactic Simplification of Compound and Complex Sentences
Archana Praveen Kumar ... Roshan Jacob Manoj
IEEE Access | VOL. 10
Archana Praveen Kumar, et. al.Archana Praveen Kumar ... Roshan Jacob Manoj
01 Jan 2021
IEEE Access | VOL. 10

Comparing Resources for Spanish Lexical Simplification
Horacio Saggion ... Stefan Bott
-
Horacio Saggion, et. al.Horacio Saggion ... Stefan Bott
01 Jan 2013
01 Jan 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Lexical simplification benchmarks for English, Portuguese, and Spanish.

Abstract

Talk to us

Similar Papers

More From: Frontiers in Artificial Intelligence