A survey of research on text simplification

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Text simplification, defined narrowly, is the process of reducing the linguistic complexity of a text, while still retaining the original information and meaning. More broadly, text simplification encompasses other operations; for example, conceptual simplification to simplify content as well as form, elaborative modification, where redundancy and explicitness are used to emphasise key points, and text summarisation to omit peripheral or inappropriate information. There is substantial evidence that manual text simplification is an effective intervention for many readers, but automatic simplification has only recently become an established research field. There have been several recent papers on the topic, however, which bring to the table a multitude of methodologies, each with their strengths and weaknesses. The goal of this paper is to summarise the large interdisciplinary body of work on text simplification and highlight the most promising research directions to move the field forward.

Similar Papers
  • Research Article
  • Cite Count Icon 106
  • 10.1145/2738046
Making It Simplext
  • May 11, 2015
  • ACM Transactions on Accessible Computing
  • Horacio Saggion + 5 more

The way in which a text is written can be a barrier for many people. Automatic text simplification is a natural language processing technology that, when mature, could be used to produce texts that are adapted to the specific needs of particular users. Most research in the area of automatic text simplification has dealt with the English language. In this article, we present results from the Simplext project, which is dedicated to automatic text simplification for Spanish. We present a modular system with dedicated procedures for syntactic and lexical simplification that are grounded on the analysis of a corpus manually simplified for people with special needs. We carried out an automatic evaluation of the system’s output, taking into account the interaction between three different modules dedicated to different simplification aspects. One evaluation is based on readability metrics for Spanish and shows that the system is able to reduce the lexical and syntactic complexity of the texts. We also show, by means of a human evaluation, that sentence meaning is preserved in most cases. Our results, even if our work represents the first automatic text simplification system for Spanish that addresses different linguistic aspects, are comparable to the state of the art in English Automatic Text Simplification.

  • Research Article
  • Cite Count Icon 6
  • 10.1145/3523265.3523268
The use of automatic text simplification to provide reading assistance to deaf and hard-of-hearing individuals in computing fields
  • Jan 1, 2022
  • ACM SIGACCESS Accessibility and Computing
  • Oliver Alonzo

Automatic Text Simplification (ATS) aims to rewrite text in a way that reduces its linguistic complexity while preserving its original meaning. While some prior research has explored using ATS to provide reading assistance to different user groups, relatively little work has investigated its use for Deaf and Hard-of-hearing (DHH) adults or readers in a particular domain. In this project, we investigate the use of ATS-based reading assistance tools for DHH individuals in the computing and information technology (IT) fields, motivated by prior work suggesting that computing professions often require reading about new technologies in order to stay current in the profession. Employing a variety of research methods, we investigate questions including the needs and interests of DHH individuals in the computing and IT fields for ATS-based reading assistance tools and their preferences for different interface parameters of these tools. We also investigate how to evaluate these technologies with this particular user group and how they may benefit from using these tools. This summary presents the motivation for this work, positions it in the context of the related literature, and outlines the proposed solution, our current progress and the project's contributions.

  • Conference Article
  • Cite Count Icon 2
  • 10.5167/uzh-192839
A Corpus for Automatic Readability Assessment and Text Simplification of German
  • May 16, 2020
  • Alessia Battisti + 4 more

In this paper, we present a corpus for use in automatic readability assessment and automatic text simplification for German, the first of its kind for this language. The corpus is compiled from web sources and consists of parallel as well as monolingual-only (simplified German) data amounting to approximately 6,200 documents (nearly 211,000 sentences). As a unique feature, the corpus contains information on text structure (e.g., paragraphs, lines), typography (e.g., font type, font style), and images (content, position, and dimensions). While the importance of considering such information in machine learning tasks involving simplified language, such as readability assessment, has repeatedly been stressed in the literature, we provide empirical evidence for its benefit. We also demonstrate the added value of leveraging monolingual-only data for automatic text simplification via machine translation through applying back-translation, a data augmentation technique.

  • Conference Article
  • Cite Count Icon 8
  • 10.26615/978-954-452-056-4_131
Automated Text Simplification as a Preprocessing Step for Machine Translation into an Under-resourced Language
  • Oct 22, 2019
  • Sanja Štajner + 1 more

In this work, we investigate the possibility of using fully automatic text simplification system on the English source in machine translation (MT) for improving its translation into an under-resourced language. We use the state-of-the-art automatic text simplification (ATS) system for lexically and syntactically simplifying source sentences, which are then translated with two state-of-the-art English-to-Serbian MT systems, the phrase-based MT (PBMT) and the neural MT (NMT). We explore three different scenarios for using the ATS in MT: (1) using the raw output of the ATS; (2) automatically filtering out the sentences with low grammaticality and meaning preservation scores; and (3) performing a minimal manual correction of the ATS output. Our results show improvement in fluency of the translation regardless of the chosen scenario, and difference in success of the three scenarios depending on the MT approach used (PBMT or NMT) with regards to improving translation fluency and post-editing effort.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/bip56202.2022.10032482
Towards Text Simplification in Spanish: A Brief Overview of Deep Learning Approaches for Text Simplification
  • Nov 15, 2022
  • Mario Romero + 5 more

Text simplification refers to the transformation of a specific source text into a target text aiming to increase understanding and readability for one or more specific audiences. This task demands large human efforts and specialized knowledge, which makes the usage of automated or semi-automated computational approaches appealing. The rise of deep learning as an unifying paradigm between seemingly different fields as image analysis, sound processing and natural language processing has considerably influenced the current state of the art approaches for automatic text simplification. Therefore, in this work, we focus on the study of deep learning based state of the art methods for automatic text simplification in the Spanish language. For this end, we first disentangle the different tasks which can be addressed in order to yield a simplified text in general. Later we review the latest deep learning-based approaches, along with the main datasets and performance metrics used in the field. We also describe approaches to deal with small datasets and technical words. Finally, we describe some lessons to build accurate automatic text simplification systems in Spanish, as in this language there is a noticeable shortage of work for text simplification.

  • Research Article
  • Cite Count Icon 4
  • 10.1162/tacl_a_00653
Do Text Simplification Systems Preserve Meaning? A Human Evaluation via Reading Comprehension
  • Apr 16, 2024
  • Transactions of the Association for Computational Linguistics
  • Sweta Agrawal + 1 more

Automatic text simplification (TS) aims to automate the process of rewriting text to make it easier for people to read. A pre-requisite for TS to be useful is that it should convey information that is consistent with the meaning of the original text. However, current TS evaluation protocols assess system outputs for simplicity and meaning preservation without regard for the document context in which output sentences occur and for how people understand them. In this work, we introduce a human evaluation framework to assess whether simplified texts preserve meaning using reading comprehension questions. With this framework, we conduct a thorough human evaluation of texts by humans and by nine automatic systems. Supervised systems that leverage pre-training knowledge achieve the highest scores on the reading comprehension tasks among the automatic controllable TS systems. However, even the best-performing supervised system struggles with at least 14% of the questions, marking them as “unanswerable” based on simplified content. We further investigate how existing TS evaluation metrics and automatic question-answering systems approximate the human judgments we obtained.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/iscaie54458.2022.9794534
Towards Personalized and Simplified Expository Texts: Pre-trained Classification and Neural Networks Co-Modeling
  • May 21, 2022
  • Safura Adeela Sukiman + 1 more

The goal of automatic text simplification is to reorganize complex text structures into simpler, more comprehendible texts while retaining their original meaning. The automatic text simplification model, coupled with the personalization element, makes it an indispensable tool for assisting students with learning disabilities who struggle to comprehend expository texts found in school textbooks. In recent years, neural networks have been widely embraced in simplified text generation, with most earlier researchers focusing on the Long Short-Term Memory (LSTM), Recurrent Neural Network (RNN), and Transformer models. In general, however, the majority of their efforts resulted in simple, generic texts, and a lack of cognitive-based personalization elements was found in their models. In this paper, we present the concept of generating personalized and simplified expository texts by joining both pre-trained classification and neural networks models. The pre-trained classification aims to predict complex text structures and phrases that give challenges for students with learning disabilities to comprehend, while the neural networks model is then used to generate simplified expository texts based on the predicted text complexity. The advantage of these joint models is the ability to generate simplified expository texts adapted to the cognitive level of students with learning disabilities. This opens up opportunities for continuously personalized learning, makes them less struggling, and increases their motivation to stay competitive with their peers.

  • Conference Article
  • Cite Count Icon 6
  • 10.1145/3663548.3675645
Design and Evaluation of an Automatic Text Simplification Prototype with Deaf and Hard-of-hearing Readers
  • Oct 27, 2024
  • Oliver Alonzo + 5 more

Research has observed benefits from providing lexical and syntactic approaches to Automatic Text Simplification (ATS) to Deaf and Hard-of-hearing (DHH) readers. However, little research has explored DHH readers’ design preferences and interactions with these approaches. This work first explores the design space of ATS systems with DHH readers, identifying potential design configurations for evaluation. Open-ended discussion of participants’ design preferences reveal values informing those preferences, including maintaining reading fluency and efficiency, and control over the tool. Using popular design choices from our formative study, we evaluated a prototype that provides various simplification types to explore DHH readers’ interactions with the system. We observed potential conflicts between participants’ values and design preferences, such as the prototype’s impact on participants’ reading speed and participants’ perceived need to reread simplifications suggested by the tool. However, participants found the tool useful, showing a nuanced preference towards world-level lexical simplifications using pop-ups. Our findings highlight the importance of the tool’s design on users’ reading experiences, and provide implications for the design and evaluation of ATS prototypes with target readers.

  • PDF Download Icon
  • Conference Article
  • Cite Count Icon 1
  • 10.5121/csit.2022.121518
GRASS: A Syntactic Text Simplification System based on Semantic Representations
  • Sep 17, 2022
  • Rita Hijazi + 2 more

Automatic Text Simplification (ATS) is the process of reducing a text's linguistic complexity to improve its understandability and readability while maintaining its original information, content, and meaning. Several text transformation operations can be performed such as splitting a sentence into several shorter sentences, substitution of complex elements, and reorganization. It has been shown that the implementation of these operations essentially at a syntactic level causes several problems that could be solved by using semantic representations. In this paper, we present GRASS (GRAph-based Semantic representation for syntactic Simplification), a rulebased automatic syntactic simplification system that uses semantic representations. The system allows the syntactic transformation of complex constructions, such as subordination clauses, appositive clauses, coordination clauses, and passive forms into simpler sentences. It is based on graph-based meaning representation of the text expressed in DMRS (Dependency Minimal Recursion Semantics) notation and it uses rewriting rules. The experimental results obtained on a reference corpus and according to specific metrics outperform the results obtained by other state of the art systems on the same reference corpus.

  • Research Article
  • Cite Count Icon 47
  • 10.1016/j.ipm.2020.102351
HTSS: A novel hybrid text summarisation and simplification architecture
  • Jul 13, 2020
  • Information Processing & Management
  • Farooq Zaman + 4 more

HTSS: A novel hybrid text summarisation and simplification architecture

  • Video Transcripts
  • 10.48448/19gd-3934
Portuguese Neural Text Simplification using Machine Translation
  • Nov 16, 2021
  • Underline Science Inc.
  • Rafael Mello + 5 more

Automatic Text Simplification (ATS) has played a significant role in the Natural Language Processing (NLP) field. ATS is a sequence-to-sequence problem aiming to create a new version of the original text removing complex and domain-specific words. It can improve communication and understanding of documents from specific domains, as well as support second language learning. This paper presents an empirical study on the use of state-of-the-art ATS methods to simplify texts in Portuguese. It is important to remark that the literature reports the challenge in analyzing Portuguese texts due to the lack of resources compared to other languages (i.e., English). More specifically, this work evaluated different Neural Machine Translation (NMT) techniques for ATS in Portuguese. The experiments showed that NMT achieved promising results in Portuguese texts, obtaining 40.89 BLEU score using multiple parallel corpora and raising the overall readability score by more than 5 points.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-3-031-35320-8_5
A Review of Parallel Corpora for Automatic Text Simplification. Key Challenges Moving Forward
  • Jan 1, 2023
  • Tania Josephine Martin + 2 more

This review of parallel corpora for automatic text simplification (ATS) involves an analysis of forty-nine papers wherein the corpora are presented, focusing on corpora in the Indo-European languages of Western Europe. We improve on recent corpora reviews by reporting on the target audience of the ATS, the language and domain of the source text, and other metadata for each corpus, such as alignment level, annotation strategy, and the transformation applied to the simplified text. The key findings of the review are: 1) the lack of resources that address ATS aimed at domains which are important for social inclusion, such as health and public administration; 2) the lack of resources aimed at audiences with mild cognitive impairment; 3) the scarcity of experiments where the target audience was directly involved in the development of the corpus; 4) more than half the proposals do not include any extra annotation, thereby lacking detail on how the simplification was done, or the linguistic phenomenon tackled by the simplification; 5) other types of annotation, such as the type and frequency of the transformation applied could identify the most frequent simplification strategies; and, 6) future strategies to advance the field of ATS could leverage automatic procedures to make the annotation process more agile and efficient.

  • Research Article
  • Cite Count Icon 7
  • 10.1093/applin/amac057
The Effect of Automatic Text Simplification on L2 Readers’ Text Comprehension
  • Oct 19, 2022
  • Applied Linguistics
  • Dennis Murphy Odo

Texts used in L2 classrooms have traditionally been simplified manually, but recent technological advances allow us to investigate whether automatic text simplification (ATS) software can help L2 learners comprehend texts in second and foreign languages. Participants were divided into low and high L2 reading proficiency groups and assigned to read either the authentic or automatically simplified version of a text and completed a free recall task and MC comprehension test. The results did not show any significant correlations among the variables of topic knowledge, topic interest, and MC comprehension, but there were correlations among L2 reading comprehension, MC comprehension, and free recall results. Results also showed that the automatically simplified text facilitated the comprehension of the more proficient readers but not the less proficient readers according to their performance on the free recall assessment. Implications are that L2 teachers cannot blindly use whatever text they want with ATS, and ATS software designers may need to reconsider the current conservative approach to simplification that many ATS tools use.

  • Research Article
  • 10.1007/s10579-025-09879-4
A comparative study of sentence alignment methods for Spanish text simplification
  • Mar 3, 2026
  • Language Resources and Evaluation
  • Christina Niklaus + 3 more

Millions of people worldwide face barriers in accessing and understanding complex written information due to limited literacy. Automatic text simplification (ATS) addresses this challenge by transforming complex texts into simpler, more accessible versions. However, most existing ATS research focuses on English, leaving Spanish, a language spoken by over 500 million people, underrepresented. This paper fills this gap by introducing large-scale sentence-aligned simplification resources for Spanish, developed from the Newsela and ClearSim corpora. We propose detailed guidelines for manual alignment, evaluate a wide range of automatic sentence alignment algorithms, and present the first systematic exploration of LLM-based monolingual sentence alignment in Spanish. Our analysis incorporates comprehensive quantitative and qualitative evaluation, supported by statistical significance testing, and reveals clear differences in the structural simplification patterns across corpora. In addition, we train and release baseline ATS models using the new aligned datasets, demonstrating their practical utility for downstream simplification. All alignment code, trained models, and evaluation scripts will be publicly released to ensure transparency and reproducibility. Together, these contributions substantially advance the resources and methodology for Spanish-language ATS.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-981-16-4807-6_23
Automatic Text Simplification Using LSTM Encoder Decoder Model
  • Jan 1, 2022
  • Om Prakash Jena + 4 more

The text simplification is a process of simplifying the natural language in such a way that it became easier to understand and to read the language. Text simplification is a part of the natural language processing which is a vast field of research. Here we have proposed several models of machine learning to achieve our goal. The text simplification process can be of any type, it may involve syntax simplification or can involve semantic simplification. In our paper we have taken a historical dataset named Declaration of Independence. The data set is containing 15,000 words to train and test our models to achieve best result. The process of text simplification is similar to the process of text summarization but not same. In the process of text summarization we mainly focus on summarizing the text which may involve minimizing the length of the word. But in text simplification process we focus on how to simplify the text to make it easier to read and understand the language. Sometimes the results of text simplification can be longer while reducing the explanation for the difficult text. Here in this paper we have used several methods of machine learning like Naive Bayes Classifier, LSTM Network and LSTM Encoding Decoding to develop our model and also to measure the performance of these methods. We observed that LSTM Encoder Decoder Network out break the others by achieving highest accuracy with 87%.KeywordsText simplificationNLPNaïve Bayes classifierLSTM networkLSTM encoder decoder

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant