Towards Text Simplification in Spanish: A Brief Overview of Deep Learning Approaches for Text Simplification
This paper reviews deep learning methods for Spanish automatic text simplification, highlighting challenges such as small datasets and technical vocabulary, and discusses datasets, performance metrics, and strategies to improve system accuracy, addressing the limited existing research in this language.
Text simplification refers to the transformation of a specific source text into a target text aiming to increase understanding and readability for one or more specific audiences. This task demands large human efforts and specialized knowledge, which makes the usage of automated or semi-automated computational approaches appealing. The rise of deep learning as an unifying paradigm between seemingly different fields as image analysis, sound processing and natural language processing has considerably influenced the current state of the art approaches for automatic text simplification. Therefore, in this work, we focus on the study of deep learning based state of the art methods for automatic text simplification in the Spanish language. For this end, we first disentangle the different tasks which can be addressed in order to yield a simplified text in general. Later we review the latest deep learning-based approaches, along with the main datasets and performance metrics used in the field. We also describe approaches to deal with small datasets and technical words. Finally, we describe some lessons to build accurate automatic text simplification systems in Spanish, as in this language there is a noticeable shortage of work for text simplification.
- Research Article
106
- 10.1145/2738046
- May 11, 2015
- ACM Transactions on Accessible Computing
The way in which a text is written can be a barrier for many people. Automatic text simplification is a natural language processing technology that, when mature, could be used to produce texts that are adapted to the specific needs of particular users. Most research in the area of automatic text simplification has dealt with the English language. In this article, we present results from the Simplext project, which is dedicated to automatic text simplification for Spanish. We present a modular system with dedicated procedures for syntactic and lexical simplification that are grounded on the analysis of a corpus manually simplified for people with special needs. We carried out an automatic evaluation of the system’s output, taking into account the interaction between three different modules dedicated to different simplification aspects. One evaluation is based on readability metrics for Spanish and shows that the system is able to reduce the lexical and syntactic complexity of the texts. We also show, by means of a human evaluation, that sentence meaning is preserved in most cases. Our results, even if our work represents the first automatic text simplification system for Spanish that addresses different linguistic aspects, are comparable to the state of the art in English Automatic Text Simplification.
- Conference Article
3
- 10.1109/iaeac50856.2021.9390937
- Mar 12, 2021
In this paper, a Chinese automatic text simplification(ATS) method based on unsupervised learning was introduced. Automatic text simplification is a research field of natural language processing. In terms of Chinese texts, the reliance on the hand-made simplified corpus or dictionary is not applicable due to a large number of texts. Chinese is a diverse language, and numerous factors need to be taken into consideration. An automatic simplification method based on Chinese text and a readability formula based on linear regression was proposed in this paper. Based on our method, just input a set of Chinese sentences and the more comprehensible sentences can be obtained through syntactic simplification and lexical simplification. Through the automatic evaluation of the hand-made simplified corpus, the readability score of our system increased by 3.68 compared with that of the original text, and the SARI score reached 36.02.
- Video Transcripts
- 10.48448/19gd-3934
- Nov 16, 2021
- Underline Science Inc.
Automatic Text Simplification (ATS) has played a significant role in the Natural Language Processing (NLP) field. ATS is a sequence-to-sequence problem aiming to create a new version of the original text removing complex and domain-specific words. It can improve communication and understanding of documents from specific domains, as well as support second language learning. This paper presents an empirical study on the use of state-of-the-art ATS methods to simplify texts in Portuguese. It is important to remark that the literature reports the challenge in analyzing Portuguese texts due to the lack of resources compared to other languages (i.e., English). More specifically, this work evaluated different Neural Machine Translation (NMT) techniques for ATS in Portuguese. The experiments showed that NMT achieved promising results in Portuguese texts, obtaining 40.89 BLEU score using multiple parallel corpora and raising the overall readability score by more than 5 points.
- Conference Article
2
- 10.5167/uzh-192839
- May 16, 2020
In this paper, we present a corpus for use in automatic readability assessment and automatic text simplification for German, the first of its kind for this language. The corpus is compiled from web sources and consists of parallel as well as monolingual-only (simplified German) data amounting to approximately 6,200 documents (nearly 211,000 sentences). As a unique feature, the corpus contains information on text structure (e.g., paragraphs, lines), typography (e.g., font type, font style), and images (content, position, and dimensions). While the importance of considering such information in machine learning tasks involving simplified language, such as readability assessment, has repeatedly been stressed in the literature, we provide empirical evidence for its benefit. We also demonstrate the added value of leveraging monolingual-only data for automatic text simplification via machine translation through applying back-translation, a data augmentation technique.
- Research Article
- 10.1007/s10579-025-09879-4
- Mar 3, 2026
- Language Resources and Evaluation
Millions of people worldwide face barriers in accessing and understanding complex written information due to limited literacy. Automatic text simplification (ATS) addresses this challenge by transforming complex texts into simpler, more accessible versions. However, most existing ATS research focuses on English, leaving Spanish, a language spoken by over 500 million people, underrepresented. This paper fills this gap by introducing large-scale sentence-aligned simplification resources for Spanish, developed from the Newsela and ClearSim corpora. We propose detailed guidelines for manual alignment, evaluate a wide range of automatic sentence alignment algorithms, and present the first systematic exploration of LLM-based monolingual sentence alignment in Spanish. Our analysis incorporates comprehensive quantitative and qualitative evaluation, supported by statistical significance testing, and reveals clear differences in the structural simplification patterns across corpora. In addition, we train and release baseline ATS models using the new aligned datasets, demonstrating their practical utility for downstream simplification. All alignment code, trained models, and evaluation scripts will be publicly released to ensure transparency and reproducibility. Together, these contributions substantially advance the resources and methodology for Spanish-language ATS.
- Research Article
19
- 10.1007/s10579-014-9265-4
- Mar 1, 2014
- Language Resources and Evaluation
In this paper we present the development of a text simplification system for Spanish. Text simplification is the adaptation of a text for the special needs of certain groups of readers, such as language learners, people with cognitive difficulties, and elderly people, among others. There is a clear need for simplified texts, but manual production and adaptation of existing text is labour-intensive and costly. Automatic simplification is a field which attracts growing attention in Natural Language Processing, but, to the best of our knowledge, there are no existing simplification tools for Spanish. We present a corpus study which aims to identify the operations a text simplification system needs to carry out in order to produce an output similar to what human editors produce when they simplify news texts. We also present a first prototype for automatic simplification, which shows that the most important simplification operations can be successfully treated.
- Research Article
3
- 10.32473/flairs.v35i.130608
- May 4, 2022
- The International FLAIRS Conference Proceedings
Natural language processing encompasses several tasks, one of which is the automatic text simplification. Telling whether one text is simpler than another involves not only knowledge about the language being analyzed, but also a cultural knowledge of the target audience to which the text is being directed. Most of the current metrics used to measure text simplification are based on the use of parallel corpora, prepared by humans, which makes it difficult to apply the metrics in automatic text simplification in real time. In this paper, we present ISiM (Independent Simplification Metric), a metric that dismiss a parallel corpus, is simple, fast, language and human annotation independent, capable of quantifying the simplicity/complexity of a sentence, thus contributing improve automating text simplification. The results of the tests performed indicate that the proposed metric has the potential to be used to evaluate automatic methods of simplification.
- Conference Article
8
- 10.26615/978-954-452-056-4_131
- Oct 22, 2019
In this work, we investigate the possibility of using fully automatic text simplification system on the English source in machine translation (MT) for improving its translation into an under-resourced language. We use the state-of-the-art automatic text simplification (ATS) system for lexically and syntactically simplifying source sentences, which are then translated with two state-of-the-art English-to-Serbian MT systems, the phrase-based MT (PBMT) and the neural MT (NMT). We explore three different scenarios for using the ATS in MT: (1) using the raw output of the ATS; (2) automatically filtering out the sentences with low grammaticality and meaning preservation scores; and (3) performing a minimal manual correction of the ATS output. Our results show improvement in fluency of the translation regardless of the chosen scenario, and difference in success of the three scenarios depending on the MT approach used (PBMT or NMT) with regards to improving translation fluency and post-editing effort.
- Research Article
4
- 10.1162/tacl_a_00653
- Apr 16, 2024
- Transactions of the Association for Computational Linguistics
Automatic text simplification (TS) aims to automate the process of rewriting text to make it easier for people to read. A pre-requisite for TS to be useful is that it should convey information that is consistent with the meaning of the original text. However, current TS evaluation protocols assess system outputs for simplicity and meaning preservation without regard for the document context in which output sentences occur and for how people understand them. In this work, we introduce a human evaluation framework to assess whether simplified texts preserve meaning using reading comprehension questions. With this framework, we conduct a thorough human evaluation of texts by humans and by nine automatic systems. Supervised systems that leverage pre-training knowledge achieve the highest scores on the reading comprehension tasks among the automatic controllable TS systems. However, even the best-performing supervised system struggles with at least 14% of the questions, marking them as “unanswerable” based on simplified content. We further investigate how existing TS evaluation metrics and automatic question-answering systems approximate the human judgments we obtained.
- Research Article
5
- 10.1109/access.2022.3174846
- Jan 1, 2022
- IEEE Access
With the advent of new technologies, simplifying text automatically has been very popular and of high importance among natural language researchers during the last decade. The predominant research done in the area of Automatic Sentence Simplification(ASS) is inclined to either lexical or syntactical simplification of sentences. From the literature survey, it is observed that existing research in lexical simplification makes use of word substitution technique. This causes word sense ambiguity in cases where the word synonyms are not appropriate for a sentence in the given context. In contrast, syntactical simplification though accurate and applicable to Natural Language Processing (NLP) tasks, requires tremendous efforts to construct rules for a given domain. The research proposes a framework called Pattern-based Automatic Syntactic Simplification(PASS) which identifies sentences and applies rules based on grammatical patterns to simplify the sentences thereby making it more generic for NLP tasks. PASS is evaluated by human experts to rate the usefulness of the framework based on fluency, adequacy and simplicity of the sentences. Furthermore, the framework is automatically evaluated with the available online corpus using automatic metrics of SARI, BLEU, and FKGL. The proposed approach generates promising results in the field of ASS and could be used as a preliminary module for NLP tasks as well as other natural language-related applications like summarization, anaphora resolution, question-answering, and many more.
- Book Chapter
3
- 10.3233/faia230975
- Dec 7, 2023
- Frontiers in artificial intelligence and applications
Texts produced by the Brazilian judiciary have a complex and technical vocabulary, with elaborate use of the Portuguese language and many legal terms difficult to be understood, generating a barrier in communication between the judiciary and the population. In this sense, the Automatic Text Simplification (ATS), activity of the Natural Language Processing (NLP) area, can be applied to improve the readability of these types of text using specialized algorithms, and promote scalability in simplifying them, in view of the great demand in the courts. In this context, this article presents an evaluation of four methods of state of the art in text simplification, evaluated according to readability metrics, to improve the quality of existing texts in the judicial summaries, dataset containing 100 summaries of the Federal Regional Court of the 5th Region (TRF5) and another 100 of the Federal Supreme Court (STF). The methods MUSS(EN), MUSS(PT), Transformers and NMT + Attention were tested, and the results of the simplifications exceeded the FRE readability index of the original texts, making them more readable.
- Research Article
- 10.17576/gema-2021-2103-03
- Aug 30, 2021
- GEMA Online® Journal of Language Studies
Narrowly specialized information is addressed to a limited circle of professionals though it provokes interest among people without specialized education. This gives rise to a need for the popularization of scientific information. This process is carried out through simplified texts as a kind of secondary texts that are directly aimed at the addressee. Age, language proficiency and background knowledge are the main features which are usually taken into consideration by the author of the secondary text who makes changes in the text composition, as well as in its pragmatics, semantics and syntax. This article analyses traditional approaches to text simplification, computer simplification and summarization. The authors compare human-authored simplification of literary texts with the newest trends in computer simplification to promote further development of machine simplification tools. It has been found that the samples of simplified scientific texts seem to be more natural than the samples of simplified literary texts since technical background knowledge can be processed with machine tools. The authors have come to the conclusion that literary and technical texts should imply different approaches for adaptation and simplification. In addition, personal readers’ experience plays a great part in finding the implications in literary texts. In this respect it might be reasonable to create separate engines for simplifying and adapting texts from diverse spheres of knowledge. Keywords Text Simplification; Natural Language Processing (NLP); Pragmatic Adaptation; Professional Communication; Literary Texts
- Conference Article
6
- 10.1145/3663548.3675645
- Oct 27, 2024
Research has observed benefits from providing lexical and syntactic approaches to Automatic Text Simplification (ATS) to Deaf and Hard-of-hearing (DHH) readers. However, little research has explored DHH readers’ design preferences and interactions with these approaches. This work first explores the design space of ATS systems with DHH readers, identifying potential design configurations for evaluation. Open-ended discussion of participants’ design preferences reveal values informing those preferences, including maintaining reading fluency and efficiency, and control over the tool. Using popular design choices from our formative study, we evaluated a prototype that provides various simplification types to explore DHH readers’ interactions with the system. We observed potential conflicts between participants’ values and design preferences, such as the prototype’s impact on participants’ reading speed and participants’ perceived need to reread simplifications suggested by the tool. However, participants found the tool useful, showing a nuanced preference towards world-level lexical simplifications using pop-ups. Our findings highlight the importance of the tool’s design on users’ reading experiences, and provide implications for the design and evaluation of ATS prototypes with target readers.
- Book Chapter
1
- 10.1007/978-981-16-4807-6_23
- Jan 1, 2022
The text simplification is a process of simplifying the natural language in such a way that it became easier to understand and to read the language. Text simplification is a part of the natural language processing which is a vast field of research. Here we have proposed several models of machine learning to achieve our goal. The text simplification process can be of any type, it may involve syntax simplification or can involve semantic simplification. In our paper we have taken a historical dataset named Declaration of Independence. The data set is containing 15,000 words to train and test our models to achieve best result. The process of text simplification is similar to the process of text summarization but not same. In the process of text summarization we mainly focus on summarizing the text which may involve minimizing the length of the word. But in text simplification process we focus on how to simplify the text to make it easier to read and understand the language. Sometimes the results of text simplification can be longer while reducing the explanation for the difficult text. Here in this paper we have used several methods of machine learning like Naive Bayes Classifier, LSTM Network and LSTM Encoding Decoding to develop our model and also to measure the performance of these methods. We observed that LSTM Encoder Decoder Network out break the others by achieving highest accuracy with 87%.KeywordsText simplificationNLPNaïve Bayes classifierLSTM networkLSTM encoder decoder
- Research Article
209
- 10.1075/itl.165.2.06sid
- Dec 31, 2014
- ITL - International Journal of Applied Linguistics
Text simplification, defined narrowly, is the process of reducing the linguistic complexity of a text, while still retaining the original information and meaning. More broadly, text simplification encompasses other operations; for example, conceptual simplification to simplify content as well as form, elaborative modification, where redundancy and explicitness are used to emphasise key points, and text summarisation to omit peripheral or inappropriate information. There is substantial evidence that manual text simplification is an effective intervention for many readers, but automatic simplification has only recently become an established research field. There have been several recent papers on the topic, however, which bring to the table a multitude of methodologies, each with their strengths and weaknesses. The goal of this paper is to summarise the large interdisciplinary body of work on text simplification and highlight the most promising research directions to move the field forward.