Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Portuguese Neural Text Simplification using Machine Translation

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Automatic Text Simplification (ATS) has played a significant role in the Natural Language Processing (NLP) field. ATS is a sequence-to-sequence problem aiming to create a new version of the original text removing complex and domain-specific words. It can improve communication and understanding of documents from specific domains, as well as support second language learning. This paper presents an empirical study on the use of state-of-the-art ATS methods to simplify texts in Portuguese. It is important to remark that the literature reports the challenge in analyzing Portuguese texts due to the lack of resources compared to other languages (i.e., English). More specifically, this work evaluated different Neural Machine Translation (NMT) techniques for ATS in Portuguese. The experiments showed that NMT achieved promising results in Portuguese texts, obtaining 40.89 BLEU score using multiple parallel corpora and raising the overall readability score by more than 5 points.

Similar Papers
  • Conference Article
  • Cite Count Icon 8
  • 10.26615/978-954-452-056-4_131
Automated Text Simplification as a Preprocessing Step for Machine Translation into an Under-resourced Language
  • Oct 22, 2019
  • Sanja Štajner + 1 more

In this work, we investigate the possibility of using fully automatic text simplification system on the English source in machine translation (MT) for improving its translation into an under-resourced language. We use the state-of-the-art automatic text simplification (ATS) system for lexically and syntactically simplifying source sentences, which are then translated with two state-of-the-art English-to-Serbian MT systems, the phrase-based MT (PBMT) and the neural MT (NMT). We explore three different scenarios for using the ATS in MT: (1) using the raw output of the ATS; (2) automatically filtering out the sentences with low grammaticality and meaning preservation scores; and (3) performing a minimal manual correction of the ATS output. Our results show improvement in fluency of the translation regardless of the chosen scenario, and difference in success of the three scenarios depending on the MT approach used (PBMT or NMT) with regards to improving translation fluency and post-editing effort.

  • Book Chapter
  • Cite Count Icon 3
  • 10.3233/faia230975
Automatic Simplification of Legal Texts in Portuguese Using Machine Learning
  • Dec 7, 2023
  • Frontiers in artificial intelligence and applications
  • Alexandre Alves + 3 more

Texts produced by the Brazilian judiciary have a complex and technical vocabulary, with elaborate use of the Portuguese language and many legal terms difficult to be understood, generating a barrier in communication between the judiciary and the population. In this sense, the Automatic Text Simplification (ATS), activity of the Natural Language Processing (NLP) area, can be applied to improve the readability of these types of text using specialized algorithms, and promote scalability in simplifying them, in view of the great demand in the courts. In this context, this article presents an evaluation of four methods of state of the art in text simplification, evaluated according to readability metrics, to improve the quality of existing texts in the judicial summaries, dataset containing 100 summaries of the Federal Regional Court of the 5th Region (TRF5) and another 100 of the Federal Supreme Court (STF). The methods MUSS(EN), MUSS(PT), Transformers and NMT + Attention were tested, and the results of the simplifications exceeded the FRE readability index of the original texts, making them more readable.

  • Research Article
  • 10.5445/ir/1000104498
Multilingual Neural Translation
  • Feb 14, 2020
  • Repository KITopen (Karlsruhe Institute of Technology)
  • Thanh-Le Ha

Multilingual Neural Translation

  • Research Article
  • Cite Count Icon 13
  • 10.1145/3665244
Neural Machine Translation for Low-Resource Languages from a Chinese-centric Perspective: A Survey
  • Jun 21, 2024
  • ACM Transactions on Asian and Low-Resource Language Information Processing
  • Jinyi Zhang + 7 more

Machine translation–the automatic transformation of one natural language (source language) into another (target language) through computational means–occupies a central role in computational linguistics and stands as a cornerstone of research within the field of Natural Language Processing (NLP). In recent years, the prominence of Neural Machine Translation (NMT) has grown exponentially, offering an advanced framework for machine translation research. It is noted for its superior translation performance, especially when tackling the challenges posed by low-resource language pairs that suffer from a limited corpus of data resources. This article offers an exhaustive exploration of the historical trajectory and advancements in NMT, accompanied by an analysis of the underlying foundational concepts. It subsequently provides a concise demarcation of the unique characteristics associated with low-resource languages and presents a succinct review of pertinent translation models and their applications, specifically within the context of languages with low-resources. Moreover, this article delves deeply into machine translation techniques, highlighting approaches tailored for Chinese-centric low-resource languages. Ultimately, it anticipates upcoming research directions in the realm of low-resource language translation.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/iaeac50856.2021.9390937
Chinese Automatic Text Simplification Based on Unsupervised Learning
  • Mar 12, 2021
  • Yang Sen + 1 more

In this paper, a Chinese automatic text simplification(ATS) method based on unsupervised learning was introduced. Automatic text simplification is a research field of natural language processing. In terms of Chinese texts, the reliance on the hand-made simplified corpus or dictionary is not applicable due to a large number of texts. Chinese is a diverse language, and numerous factors need to be taken into consideration. An automatic simplification method based on Chinese text and a readability formula based on linear regression was proposed in this paper. Based on our method, just input a set of Chinese sentences and the more comprehensible sentences can be obtained through syntactic simplification and lexical simplification. Through the automatic evaluation of the hand-made simplified corpus, the readability score of our system increased by 3.68 compared with that of the original text, and the SARI score reached 36.02.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/bip56202.2022.10032482
Towards Text Simplification in Spanish: A Brief Overview of Deep Learning Approaches for Text Simplification
  • Nov 15, 2022
  • Mario Romero + 5 more

Text simplification refers to the transformation of a specific source text into a target text aiming to increase understanding and readability for one or more specific audiences. This task demands large human efforts and specialized knowledge, which makes the usage of automated or semi-automated computational approaches appealing. The rise of deep learning as an unifying paradigm between seemingly different fields as image analysis, sound processing and natural language processing has considerably influenced the current state of the art approaches for automatic text simplification. Therefore, in this work, we focus on the study of deep learning based state of the art methods for automatic text simplification in the Spanish language. For this end, we first disentangle the different tasks which can be addressed in order to yield a simplified text in general. Later we review the latest deep learning-based approaches, along with the main datasets and performance metrics used in the field. We also describe approaches to deal with small datasets and technical words. Finally, we describe some lessons to build accurate automatic text simplification systems in Spanish, as in this language there is a noticeable shortage of work for text simplification.

  • Conference Article
  • Cite Count Icon 11
  • 10.1109/mercon.2018.8421939
Transliteration and Byte Pair Encoding to Improve Tamil to Sinhala Neural Machine Translation
  • May 1, 2018
  • Pasindu Tennage + 4 more

Neural Machine Translation (NMT) is the current state-of-the-art machine translation technique. However, applicability of NMT for language pairs that have high morphological variations is still debatable. Lack of language resources, especially a sufficiently large parallel corpus causes additional issues, which leads to very poor translation performance, when NMT is applied to languages with high morphological variations. In this paper, we present three techniques to improve domain-specific NMT performance of the under-resourced language pair Sinhala and Tamil that have high morphological variations. Out of these three techniques, transliteration is a novel approach to improve domain-specific NMT performance for language pairs such as Sinhala and Tamil that share a common grammatical structure and have moderate lexical similarity. We built the first transliteration system for Sinhala to English and Tamil to English, which provided an accuracy of 99.6%, when tested with the parallel corpus we used for NMT training. The other technique we employed is Byte Pair Encoding (BPE), which is a technique that has been used to achieve open vocabulary translation with a fixed vocabulary of subword symbols. Our experiments show that while the translation based on independent BPE models and pure transliteration perform moderately, integrating transliteration to build a joint BPE model for the aforementioned language pair increases the translation quality by 1.68 BLEU score.

  • Research Article
  • Cite Count Icon 106
  • 10.1145/2738046
Making It Simplext
  • May 11, 2015
  • ACM Transactions on Accessible Computing
  • Horacio Saggion + 5 more

The way in which a text is written can be a barrier for many people. Automatic text simplification is a natural language processing technology that, when mature, could be used to produce texts that are adapted to the specific needs of particular users. Most research in the area of automatic text simplification has dealt with the English language. In this article, we present results from the Simplext project, which is dedicated to automatic text simplification for Spanish. We present a modular system with dedicated procedures for syntactic and lexical simplification that are grounded on the analysis of a corpus manually simplified for people with special needs. We carried out an automatic evaluation of the system’s output, taking into account the interaction between three different modules dedicated to different simplification aspects. One evaluation is based on readability metrics for Spanish and shows that the system is able to reduce the lexical and syntactic complexity of the texts. We also show, by means of a human evaluation, that sentence meaning is preserved in most cases. Our results, even if our work represents the first automatic text simplification system for Spanish that addresses different linguistic aspects, are comparable to the state of the art in English Automatic Text Simplification.

  • Research Article
  • Cite Count Icon 19
  • 10.1007/s10579-014-9265-4
Text simplification resources for Spanish
  • Mar 1, 2014
  • Language Resources and Evaluation
  • Stefan Bott + 1 more

In this paper we present the development of a text simplification system for Spanish. Text simplification is the adaptation of a text for the special needs of certain groups of readers, such as language learners, people with cognitive difficulties, and elderly people, among others. There is a clear need for simplified texts, but manual production and adaptation of existing text is labour-intensive and costly. Automatic simplification is a field which attracts growing attention in Natural Language Processing, but, to the best of our knowledge, there are no existing simplification tools for Spanish. We present a corpus study which aims to identify the operations a text simplification system needs to carry out in order to produce an output similar to what human editors produce when they simplify news texts. We also present a first prototype for automatic simplification, which shows that the most important simplification operations can be successfully treated.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 201
  • 10.14569/specialissue.2014.040109
A Survey of Automated Text Simplification
  • Jan 1, 2014
  • International Journal of Advanced Computer Science and Applications
  • Matthew Shardlow

Text simplification modifies syntax and lexicon to improve the understandability of language for an end user. This survey identifies and classifies simplification research within the period 1998-2013. Simplification can be used for many applications, including: Second language learners, preprocessing in pipelines and assistive technology. There are many approaches to the simplification task, including: lexical, syntactic, statistical machine translation and hybrid techniques. This survey also explores the current challenges which this field faces. Text simplification is a non-trivial task which is rapidly growing into its own field. This survey gives an overview of contemporary research whilst taking into account the history that has brought text simplification to its current state.

  • Research Article
  • 10.25073/2588-1086/vnucsce.231
Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language
  • May 30, 2020
  • VNU Journal of Science: Computer Science and Communication Engineering
  • Nghia-Luan Pham + 1 more

In this paper, we propose a new method for domain adaptation in Statistical Machine Translation for low-resource domains in English-Vietnamese language. Specifically, our method only uses monolingual data to adapt the translation phrase-table, our system brings improvements over the SMT baseline system. We propose two steps to improve the quality of SMT system: (i) classify phrases on the target side of the translation phrase-table use the probability classifier model, and (ii) adapt to the phrase-table translation by recomputing the direct translation probability of phrases.
 
 Our experiments are conducted with translation direction from English to Vietnamese on two very different domains that are legal domain (out-of-domain) and general domain (in-of-domain). The English-Vietnamese parallel corpus is provided by the IWSLT 2015 organizers and the experimental results showed that our method significantly outperformed the baseline system. Our system improved on the quality of machine translation in the legal domain up to 0.9 BLEU scores over the baseline system,…
 Keywords: 
 Machine Translation, Statistical Machine Translation, Domain Adaptation
 References
 [1] Philipp Koehn, Franz Josef Och, Daniel Marcu, Statistical phrase-based translation, In Proceedings of HLT-NAACL, Edmonton, Canada, 2003, 127-133.
 [2] Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes and Jeffrey Dean, Google’s neural machine translation system: Bridging the gap between human and machine translation, CoRR, abs/1609.08144, 2016.
 [3] Luisa Bentivogli, Arianna Bisazza, Mauro Cettolo and Marcello Federico, Neural versus phrase-based machine translation quality: A case study, 2016.
 [4] Barry Haddow, Philipp Koehn, Analysing the effect of out-of-domain data on smt systems, In Proceedings of the Seventh Workshop on Statistical Machine Translation, 2012, 422-432.
 [5] Boxing Chen, Roland Kuhn and George Foster, Vector space model for adaptation in statistical machine translation, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013, pp. 1285-1293.
 [6] Daniel Dahlmeier, Hwee Tou Ng, Siew Mei Wu4, Building a large annotated corpus of learner english: The nus corpus of learner english, In Proceedings of the NAACL Workshop on Innovative Use of NLP for Building Educational Appli-cations, 2013.
 [7] Eva Hasler, Phil Blunsom, Philipp Koehn and Barry Haddow, Dynamic topic adaptation for phrase-based mt, In Proceedings of the 14th Conference of the European Chapter of The Association for Computational Linguistics, 2014, pp. 328-337.
 [8] George Foster, Roland Kuhn, Mixture-model adaptation for smt, Proceedings of the Second Workshop on Statistical Machine Translation, Prague, Association for Computational Linguistics, 2007, pp. 128-135.
 [9] George Foster, Boxing Chen, Roland Kuhn, Simulating discriminative training for linear mixture adaptation in statistical machine translation, Proceedings of the MT Summit, 2013.
 [10] Hoang Cuong, Khalil Sima’an, and Ivan Titov, Adapting to all domains at once: Rewarding domain invariance in smt, Proceedings of the Transactions of the Association for Computational Linguistics (TACL), 2016.
 [11] Ryo Masumura, Taichi Asam, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi, and Akinori Ito, Hierarchical latent words language models for robust modeling to out-of domain tasks, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015, pp. 1896-1901.
 [12] Chenhui Chu, Raj Dabre, and Sadao Kurohashi. An empirical comparison of simple domain adaptation methods for neural machine translation, 2017.
 [13] Markus Freitag, Yaser Al-Onaizan, Fast domain adaptation for neural machine translation, 2016.
 [14] Jia Xu, Yonggang Deng, Yuqing Gao and Hermann Ney, Domain dependent statistical machine translation, In Proceedings of the MT Summit XI, 2007, pp. 515-520.
 [15] Hua Wu, Haifeng Wang Chengqing Zong, Domain adaptation for statistical machine translation with domain dictionary and monolingual corpora, In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, 2008, pp. 993-1000.
 [16] Adam Berger, Stephen Della Pietra, and Vincent Della Pietra, A maximum entropy approach to natural language processing, Computational Linguistics, 22, 1996.
 [17] 18Santanu Pal, Sudip Naskar, Josef Van Genabith, Uds-sant, English-German hybrid machine translation system, In Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisbon, Portugal, September, Association for Computational Linguistics, 2015, pp. 152-157.
 [18] Louis Onrust, Antal van den Bosch, Hugo Van hamme, Improving cross-domain n-gram language modelling with skipgrams, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 2016, pp. 137-142.
 [19] Mark Aronoff, Kirsten Fudeman, What is morphology, V 8. john wiley and sons, 2011.
 [20] Laurence C. Thompson, The problem of the word in vietnamese, In journal of the International Linguistic Association 19(1) (1963) 39-52. https:// doi.org/1080/00437956.1963.11659787.
 [21] Binh N. Ngo, The Vietnamese language learning framework, Journal of Southeast Asian Language Teaching 10 (2001) 1-24.
 [22] Le Hong Phuong, Nguyen Thi Minh Huyen, Azim Roussanaly, Ho Tuong Vinh, A hybrid approach to word segmentation of vietnamese texts, 2008.
 [23] Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondřej Bojar, Alexandra Constantin, Evan Herbst, Moses: Open source toolkit for statistical machine translation, In ACL-2007: Proceedings of demo and poster sessions, Prague, Czech Republic, 2007, pp.177-180.
 [24] Franz Josef Och, Minimum error rate training in statistical machine translation, In Proceedings of ACL, 2003, pp.160-167.
 [25] Andreas Stolcke, Srilm - an extensible language modeling toolkit, in proceedings of international conference on spoken language processing, 2002.
 [26] Papineni, Kishore, Salim Roukos, Todd Ward, WeiJing Zhu, Bleu: A method for automatic evaluation of machine translation, ACL, 2002.
 [27] G. Klein, Y. Kim, Y. Deng, J. Senellart, A.M. Rush, OpenNMT: Open-Source Toolkit for Neural Machine Translation. ArXiv e-prints.
 [28] Pratyush Banerjee, Jinhua Du, Baoli Li, Sudip Kr. Naskar, Andy Way and Josef van Genabith, Combining multi-domain statistical machine translation models using automatic classifiers, In Proceedings of AMTA 2010., 2010.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 15
  • 10.3389/fcomm.2022.706718
Automatic Text Simplification for German
  • Feb 23, 2022
  • Frontiers in Communication
  • Sarah Ebling + 6 more

The article at hand aggregates the work of our group in automatic processing of simplified German. We present four parallel (standard/simplified German) corpora compiled and curated by our group. We report on the creation of a gold standard of sentence alignments from the four sources for evaluating automatic alignment methods on this gold standard. We show that one of the alignment methods performs best on the majority of the data sources. We used two of our corpora as a basis for the first sentence-based neural machine translation (NMT) approach toward automatic simplification of German. In follow-up work, we extended our model to render it capable of explicitly operating on multiple levels of simplified German. We show that using source-side language level labels improves performance with regard to two evaluation metrics commonly applied to measuring the quality of automatic text simplification.

  • Research Article
  • 10.17576/gema-2021-2103-03
Approaches to Text Simplification: Can Computer Technologies Outdo a Human Mind?
  • Aug 30, 2021
  • GEMA Online® Journal of Language Studies
  • Svetlana Vladimirovna Pervukhina + 3 more

Narrowly specialized information is addressed to a limited circle of professionals though it provokes interest among people without specialized education. This gives rise to a need for the popularization of scientific information. This process is carried out through simplified texts as a kind of secondary texts that are directly aimed at the addressee. Age, language proficiency and background knowledge are the main features which are usually taken into consideration by the author of the secondary text who makes changes in the text composition, as well as in its pragmatics, semantics and syntax. This article analyses traditional approaches to text simplification, computer simplification and summarization. The authors compare human-authored simplification of literary texts with the newest trends in computer simplification to promote further development of machine simplification tools. It has been found that the samples of simplified scientific texts seem to be more natural than the samples of simplified literary texts since technical background knowledge can be processed with machine tools. The authors have come to the conclusion that literary and technical texts should imply different approaches for adaptation and simplification. In addition, personal readers’ experience plays a great part in finding the implications in literary texts. In this respect it might be reasonable to create separate engines for simplifying and adapting texts from diverse spheres of knowledge. Keywords Text Simplification; Natural Language Processing (NLP); Pragmatic Adaptation; Professional Communication; Literary Texts

  • Conference Article
  • 10.5753/webmedia.2025.15154
Computational Approaches for Simplifying Educational Texts: A Proposal Using spaCy
  • Nov 10, 2025
  • Vitor Amadeu Souza

This work presents an investigation into the application of Natural Language Processing (NLP) techniques for the automatic simplification of educational texts in Brazilian Portuguese. The study uses the spaCy library with the pt_core_news_sm model to perform syntactic analysis, named entity recognition, and textual readability assessment. The proposed methodology implements simplification rules based on syntactic dependency analysis, preserving essential elements such as the subject and main predicate while removing complex subordinate constructions. The results show that named entity analysis was effective in identifying people (PER), locations (LOC), organizations (ORG), and miscellaneous elements (MISC) in the analyzed texts. The original texts presented Flesch Reading Ease scores ranging from 25.23 to 54.57, indicating different levels of complexity. This research contributes to the advancement of automatic text simplification techniques in Portuguese and offers insights for the development of more accessible educational tools.

  • Research Article
  • 10.63673/lotus.1659574
Natural Language Processing, Deep Learning, and Text Processing: A Translational Perspective
  • Jun 24, 2025
  • International Journal of Language and Translation Studies
  • Derya Oğuz

This study offers a multifaceted examination of Natural Language Processing (NLP) and deep learning models, which are key components of artificial intelligence. In this context, text processing workflows operating on artificial neural networks are discussed with a particular focus on translation. The advancement of artificial neural networks and the widespread adoption of deep learning algorithms have led to significant developments in the field of NLP. Neural machine translation, which has emerged as a transformative development in artificial intelligence, has sparked debates within the translation community. Although it is still under discussion whether neural machine translation can produce translations at a human level of reliability, the speed and convenience it offers undoubtedly contribute to digitalization and automation in the era of Industry 4.0.NLP employs various methods to enable computers to process human language. Among these, deep learning algorithms based on artificial neural networks dominate current applications and neural machine translation systems. Accordingly, diverse text processing workflows are carried out within NLP and deep learning mechanisms. These workflows may vary depending on the architectural characteristics of the language model being used. The study examines recent language model architectures and their properties, with a particular focus on the text processing procedures inherent to these models. These procedures are examined in relation to the language processing mechanisms of the human brain, within the framework of a descriptive analytical approach. The study concludes that NLP and deep learning technologies will play a significant role in the future of the translation profession and emphasizes the necessity for translators to follow technological advancements in response to this transformation.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant