Automatic Text Simplification for German

  • Abstract
  • Highlights & Summary
  • PDF
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

The article at hand aggregates the work of our group in automatic processing of simplified German. We present four parallel (standard/simplified German) corpora compiled and curated by our group. We report on the creation of a gold standard of sentence alignments from the four sources for evaluating automatic alignment methods on this gold standard. We show that one of the alignment methods performs best on the majority of the data sources. We used two of our corpora as a basis for the first sentence-based neural machine translation (NMT) approach toward automatic simplification of German. In follow-up work, we extended our model to render it capable of explicitly operating on multiple levels of simplified German. We show that using source-side language level labels improves performance with regard to two evaluation metrics commonly applied to measuring the quality of automatic text simplification.

Similar Papers
  • Conference Article
  • Cite Count Icon 8
  • 10.26615/978-954-452-056-4_131
Automated Text Simplification as a Preprocessing Step for Machine Translation into an Under-resourced Language
  • Oct 22, 2019
  • Sanja Štajner + 1 more

In this work, we investigate the possibility of using fully automatic text simplification system on the English source in machine translation (MT) for improving its translation into an under-resourced language. We use the state-of-the-art automatic text simplification (ATS) system for lexically and syntactically simplifying source sentences, which are then translated with two state-of-the-art English-to-Serbian MT systems, the phrase-based MT (PBMT) and the neural MT (NMT). We explore three different scenarios for using the ATS in MT: (1) using the raw output of the ATS; (2) automatically filtering out the sentences with low grammaticality and meaning preservation scores; and (3) performing a minimal manual correction of the ATS output. Our results show improvement in fluency of the translation regardless of the chosen scenario, and difference in success of the three scenarios depending on the MT approach used (PBMT or NMT) with regards to improving translation fluency and post-editing effort.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 12
  • 10.1007/s10590-021-09266-0
An in-depth analysis of the individual impact of controlled language rules on machine translation output: a mixed-methods approach
  • Jun 1, 2021
  • Machine Translation
  • Shaimaa Marzouk

Examining the general impact of Controlled Language (CL) rules in the context of Machine Translation (MT) has been an area of research for many years. The present study focuses on the following question: how do CL rules impact MT output individually? By analysing a German corpus-based test suite of technical texts that have been translated into English by different MT systems, this study endeavours to answer this question at different levels: the general impact of CL rules (rule- and system-independent), their impact at rule level (system-independent) as well as at rule and system level. The results of five MT systems are analysed and contrasted: a rule-based system, a statistical system, two differently constructed hybrid systems, and a neural system. For this, a mixed-methods triangulation approach that includes error annotation, human evaluation, and automatic evaluation was applied. The data was analysed both qualitatively and quantitatively in terms of CL influence on the following parameters: number and type of MT errors, style and content quality, and scores of two automatic evaluation metrics. In line with many studies, the results show a general positive impact of the applied CL rules on the MT output. However, at rule level, only four rules proved to have positive effects on the aforementioned parameters; three rules had negative effects on the parameters; and two rules did not show any significant impact. At rule and system level, the rules affected the MT systems differently, as expected. Rules that had a positive impact on earlier MT approaches did not show the same impact on the neural MT approach. Furthermore, neural MT delivered distinctly better results than earlier MT approaches, namely the highest error-free, style and content quality rates both before and after applying the rules, which indicates that neural MT offers a promising solution that no longer requires CL rules for improving the MT output.

  • Conference Article
  • Cite Count Icon 3
  • 10.24963/ijcai.2018/789
From Feature to Paradigm: Deep Learning in Machine Translation (Extended Abstract)
  • Jul 1, 2018
  • Marta R Costa-Jussà

In the last years, deep learning algorithms have highly revolutionized several areas including speech, image and natural language processing. The specific field of Machine Translation (MT) has not remained invariant. Integration of deep learning in MT varies from re-modeling existing features into standard statistical systems to the development of a new architecture. Among the different neural networks, research works use feed-forward neural networks, recurrent neural networks and the encoder-decoder schema. These architectures are able to tackle challenges as having low-resources or morphology variations. This extended abstract focuses on describing the foundational works on the neural MT approach; mentioning its strengths and weaknesses; and including an analysis of the corresponding challenges and future work. The full manuscript [Costa-jussà, 2018] describes, in addition, how these neural networks have been integrated to enhance different aspects and models from statistical MT, including language modeling, word alignment, translation, reordering, and rescoring; and on describing the new neural MT approach together with recent approaches on using subword, characters and training with multilingual languages, among others.

  • Book Chapter
  • Cite Count Icon 4
  • 10.1007/978-3-030-47426-3_51
Case-Sensitive Neural Machine Translation
  • Jan 1, 2020
  • Advances in Knowledge Discovery and Data Mining
  • Xuewen Shi + 3 more

Even as an important lexical information for Latin languages, word case is often ignored in machine translation. According to observations, the translation performance drops significantly when we introduce case-sensitive evaluation metrics. In this paper, we introduce two types of case-sensitive neural machine translation (NMT) approaches to alleviate the above problems: i) adding case tokens into the decoding sequence, and ii) adopting case prediction to the conventional NMT. Our proposed approaches incorporate case information to the NMT decoder by jointly learning target word generation and word case prediction. We compare our approaches with multiple kinds of baselines including NMT with naive case-restoration methods and analyze the impacts of various setups on our approaches. Experimental results on three typical translation tasks (Zh-En, En-Fr, En-De) show that our proposed methods lead to the improvements up to 2.5, 1.0 and 0.5 in case-sensitive BLEU scores respectively. Further analyses also illustrate the inherent reasons why our approaches lead to different improvements on different translation tasks.

  • Research Article
  • 10.5445/ir/1000104498
Multilingual Neural Translation
  • Feb 14, 2020
  • Repository KITopen (Karlsruhe Institute of Technology)
  • Thanh-Le Ha

Multilingual Neural Translation

  • Research Article
  • Cite Count Icon 33
  • 10.1007/s00521-021-05895-x
Towards achieving a delicate blending between rule-based translator and neural machine translator
  • Mar 29, 2021
  • Neural Computing and Applications
  • Md Adnanul Islam + 2 more

Popular translators such as Google, Bing, etc., perform quite well when translating among the popular languages such as English, French, etc.; however, they make elementary mistakes when translating the low-resource languages such as Bengali, Arabic, etc. Google uses Neural Machine Translation (NMT) approach to build its multilingual translation system. Prior to NMT, Google used Statistical Machine Translation (SMT) approach. However, these approaches solely depend on the availability of a large parallel corpus of the translating language pairs. As a result, a good number of widely spoken languages such as Bengali, remain little explored in the research arena of artificial intelligence. Hence, the goal of this study is to explore improvized translation from Bengali to English. To do so, we study both the rule-based translator and the corpus-based machine translators (NMT and SMT) in isolation, and in combination with different approaches of blending between them. More specifically, first, we adopt popular corpus-based machine translators (NMT and SMT) and a rule-based machine translator for Bengali to English translation. Next, we integrate the rule-based translator with each of the corpus-based machine translators separately using different approaches. Besides, we perform rigorous experimentation over different datasets to report the best performance score for Bengali to English translation till today by revealing a comparison among the different approaches in terms of translation performance. Finally, we discuss how our different blending approaches can be re-used for other low-resource languages.

  • Dissertation
  • Cite Count Icon 2
  • 10.23889/suthesis.57439
Comparative Evaluation of Translation Memory (TM) and Machine Translation (MT) Systems in Translation between Arabic and English
  • Jul 22, 2021
  • Khaled Mamer Ben Milad

In general, advances in translation technology tools have enhanced translation quality significantly. Unfortunately, however, it seems that this is not the case for all language pairs. A concern arises when the users of translation tools want to work between different language families such as Arabic and English. The main problems facing Arabic<>English translation tools lie in Arabic’s characteristic free word order, richness of word inflection – including orthographic ambiguity – and optionality of diacritics, in addition to a lack of data resources. The aim of this study is to compare the performance of translation memory (TM) and machine translation (MT) systems in translating between Arabic and English.The research evaluates the two systems based on specific criteria relating to needs and expected results. The first part of the thesis evaluates the performance of a set of well-known TM systems when retrieving a segment of text that includes an Arabic linguistic feature. As it is widely known that TM matching metrics are based solely on the use of edit distance string measurements, it was expected that the aforementioned issues would lead to a low match percentage. The second part of the thesis evaluates multiple MT systems that use the mainstream neural machine translation (NMT) approach to translation quality. Due to a lack of training data resources and its rich morphology, it was anticipated that Arabic features would reduce the translation quality of this corpus-based approach. The systems’ output was evaluated using both automatic evaluation metrics including BLEU and hLEPOR, and TAUS human quality ranking criteria for adequacy and fluency.The study employed a black-box testing methodology to experimentally examine the TM systems through a test suite instrument and also to translate Arabic English sentences to collect the MT systems’ output. A translation threshold was used to evaluate the fuzzy matches of TM systems, while an online survey was used to collect participants’ responses to the quality of MT system’s output. The experiments’ input of both systems was extracted from Arabic<>English corpora, which was examined by means of quantitative data analysis. The results show that, when retrieving translations, the current TM matching metrics are unable to recognise Arabic features and score them appropriately. In terms of automatic translation, MT produced good results for adequacy, especially when translating from Arabic to English, but the systems’ output appeared to need post-editing for fluency. Moreover, when retrievingfrom Arabic, it was found that short sentences were handled much better by MT than by TM. The findings may be given as recommendations to software developers.

  • PDF Download Icon
  • Conference Article
  • Cite Count Icon 20
  • 10.18653/v1/p18-1116
Forest-Based Neural Machine Translation
  • Jan 1, 2018
  • Chunpeng Ma + 4 more

Tree-based neural machine translation (NMT) approaches, although achieved impressive performance, suffer from a major drawback: they only use the 1-best parse tree to direct the translation, which potentially introduces translation mistakes due to parsing errors. For statistical machine translation (SMT), forest-based methods have been proven to be effective for solving this problem, while for NMT this kind of approach has not been attempted. This paper proposes a forest-based NMT method that translates a linearized packed forest under a simple sequence-to-sequence framework (i.e., a forest-to-sequence NMT model). The BLEU score of the proposed method is higher than that of the sequence-to-sequence NMT, tree-based NMT, and forest-based SMT systems.

  • Video Transcripts
  • 10.48448/19gd-3934
Portuguese Neural Text Simplification using Machine Translation
  • Nov 16, 2021
  • Underline Science Inc.
  • Rafael Mello + 5 more

Automatic Text Simplification (ATS) has played a significant role in the Natural Language Processing (NLP) field. ATS is a sequence-to-sequence problem aiming to create a new version of the original text removing complex and domain-specific words. It can improve communication and understanding of documents from specific domains, as well as support second language learning. This paper presents an empirical study on the use of state-of-the-art ATS methods to simplify texts in Portuguese. It is important to remark that the literature reports the challenge in analyzing Portuguese texts due to the lack of resources compared to other languages (i.e., English). More specifically, this work evaluated different Neural Machine Translation (NMT) techniques for ATS in Portuguese. The experiments showed that NMT achieved promising results in Portuguese texts, obtaining 40.89 BLEU score using multiple parallel corpora and raising the overall readability score by more than 5 points.

  • Research Article
  • Cite Count Icon 8
  • 10.31893/multiscience.2025146
Hybrid NMT model and comparison with existing machine translation approaches
  • Oct 11, 2024
  • Multidisciplinary Science Journal
  • Ritesh Kumar Dwivedi + 2 more

Neural machine translation has transformed automated translation, surpassing traditional methods with its significant accuracy improvements. However, despite its successes, NMT still encounters several challenges, such as handling low-resource languages, maintaining contextual coherence, and addressing ambiguities in translation. This research presents a novel hybrid NMT model to overcome these limitations. It combines the strengths of traditional translation methods with modern deep learning approaches. We conduct a comprehensive comparative analysis of our hybrid model against existing machine translation approaches, including rule-based machine translation (RBMT), SMT, and state-of-the-art NMT systems. Evaluation metrics BLEU is utilized to assess the performance across English-Hindi,English-Marathi,English-Bengali language pairs and domains. Our results demonstrate that the hybrid NMT model achieves superior accuracy and fluency in translation tasks, particularly for low-resource languages and complex sentence structures. This research highlights the potential of combining different machine translation approaches and findings suggest that integrating these methods can significantly improve translation quality. The findings offer valuable insights for future research and development of more robust and versatile translation systems. Our results demonstrate that the hybrid model offers significant improvements in translation accuracy, making it a promising approach for multilingual machine translation tasks. NMT surpasses both RBMT and SMT with a BLEU score of 35.6, highlighting its effectiveness in managing context and semantics. Qualitative assessments suggest that the hybrid model effectively minimizes common translation errors, making it a robust solution for multilingual machine translation tasks. Hybrid Neural Machine Translation (NMT) models are increasingly being applied in real-world applications where the combination of rule-based, statistical, and neural approaches offers distinct advantages.

  • Research Article
  • Cite Count Icon 4
  • 10.33889/ijmems.2024.9.5.056
Improved Urdu-English Neural Machine Translation with a fully Convolutional Neural Network Encoder
  • Oct 1, 2024
  • International Journal of Mathematical, Engineering and Management Sciences
  • Huma Israr + 2 more

Neural machine translation (NMT) approaches driven by artificial intelligence (AI) has gained more and more attention in recent years, mainly due to their simplicity yet state-of-the-art performance. Despite NMT models with attention mechanism relying heavily on the accessibility of substantial parallel corpora, they have demonstrated efficacy even for languages with limited linguistic resources. The convolutional neural network (CNN) is frequently employed in tasks involving visual and speech recognition. Implementing CNN for MT is still challenging compared to the predominant approaches. Recent research has shown that the CNN-based NMT model cannot capture long-term dependencies present in the source sentence. The CNN-based model can only capture the word dependencies within the width of its filters. This unnatural character often causes a worse performance for CNN-based NMT than the RNN-based NMT models. This study introduces a simple method to improve neural translation of a low-resource language, specifically Urdu-English (UR-EN). In this paper, we use a Fully Convolutional Neural Network (FConv-NN) based NMT architecture to create a powerful MT encoder for UR-EN translation that can capture the long dependency of words in a sentence. Although the model is quite simple, it yields strong empirical results. Experimental results show that the FConv-NN model consistently outperforms the traditional CNN-based model with filters. On the Urdu-English Dataset, the FConv-NN model produces translation with a gain of 18.42 BLEU points. Moreover, the quantitative and comparative analysis shows that in a low-resource setting, FConv-NN-based NMT outperforms conventional CNN-based NMT models.

  • Conference Article
  • Cite Count Icon 10
  • 10.1109/ialp.2017.8300618
Sentence simplification with core vocabulary
  • Dec 1, 2017
  • Takumi Maruyama + 1 more

We attempt automatic text simplification with vocabulary restriction on the output side using a machine translation approach based on a simplified corpus that we built. This is the first machine translation approach in Japanese because no Japanese simplification corpus has been created to date. This corpus focuses only on paraphrases of sentence units and phrase units. It is the first time that this type of simplification has been used with such a corpus. This approach makes it possible to simplify better than existing systems do. We also compared models that changed the quantity and quality of the training data and development data. The result shows that data having a medium S-BLEU score between the original sentence and a simple sentence is most effective for automatic text simplification by a machine translation approach.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/compcomm.2018.8780734
Optimizing Attention Mechanism for Neural Machine Transltion
  • Dec 1, 2018
  • Zhonghao Li + 1 more

Nowadays, Neural Machine Translation (NMT) has become the mainstream approach for machine translation, and has got considerable improvement after attention mechanism is introduced. However, usage of attention is insufficient, some defects such as rare word problem still remain in end-to-end NMT approach, which is limited by the training targets and priori knowledge. Traditionally, words out of vocabulary (OOV words) are simply represented with a signal ‘UNK’ or cut into sub-words. In this paper, we consider some optimizations on the alignment feature of attention mechanism, which is appropriate not only to guide every output to find the best match in input sentence, but also to help overall quality of the translation. We build English-Chinese and Chinese-English NMT systems based on our algorithm, and experiment on casia2015 corpus from WMT17. The result shows that the translation of our model gets a considerable improvement respectively.

  • Research Article
  • 10.15408/bat.v31i1.44469
Evaluating Machine Translation of Cultural Terms: Readability Comparison Between Google and Yandex
  • Mar 31, 2025
  • Buletin Al-Turas
  • Diana Mentari

Purpose This study aimed to analyze the readability of Google Translate (GT) and Yandex Translate (YT) translation results on dialogue texts containing cultural terms from the book Antologi Cerita Anak Indonesia (ACAI). This study evaluated the effectiveness of the Neural Machine Translation (NMT) approach in GT and the Hybrid Machine Translation (HMT) approach in YT in conveying text meanings clearly and comprehensibly to readers.Method This research employed a cloze test involving 28 participants aged 18-24 years, along with a questionnaire to assess user preferences regarding GT and YT translation results. Text readability was analyzed using the Flesch-Kincaid Grade Level and Gunning Fog Index to measure the linguistic complexity of the translations.Results/Findings The study results show that GT's readability reaches 81.1%, while YT's readability is 74.5%, both categorized as the independent level according to Rankin & Culhane's (1969) criteria. Additionally, 80% of the 20 questionnaire respondents stated that GT's translations were clearer than those of YT. Analysis using the Flesch-Kincaid Grade Level and Gunning Fog Index shows that the readability level of GT and YT translations is classified as advanced suitable for readers with a minimum education level equivalent to a bachelor's degree.Conclusion This study showed that GT has a higher readability level than YT, which might be because of its use of NMT, producing more natural sentence structures. Meanwhile, YT, which also relied on SMT, translates based on statistical patterns, making its translations more rigid. Although both systems could produce comprehensible translations, they still struggled with accurately translating cultural terms without additional context. Therefore, human involvement remained essential to improving accuracy and contextual appropriateness in machine translation.

  • Conference Article
  • Cite Count Icon 9
  • 10.1109/iccct53315.2021.9711807
Machine Learning Approach to English-Afaan Oromo Text-Text Translation: Using Attention based Neural Machine Translation
  • Dec 16, 2021
  • Ebisa A Gemechu + 1 more

In this paper, we present a Neural Machine Translation (NMT) approach for English-Afaan Oromo text translation. NMT is a machine translation technique that applies an artificial neural network to predict the probability of a sequence of words. It is enhanced by Recurrent Neural Networks (RNN), also called the encoder-decoder networks. There have been some noticeable challenges with English-Afaan Oromo machine translation using the conventional rule-based and Statistical Machine Translation (SMT) systems. Text alignment was the main challenge since there are differences in the structure between the two languages. Afaan Oromo is a morphologically rich language, which makes it difficult to develop rules for the syntactic and semantic elements. Our proposal is to develop a machine translation model for English-Afaan Oromo text, using the attention-based NMT technique. This technique overcomes most of the text alignment and rule tagging limitations. The train/test split technique is used to train and test our model. Adam algorithm is adopted for our training optimizations. We used the BLEU score and human rating Likert scale evaluation methods for evaluation. Experiment shows that our model is significantly important in translation with an average BLEU point of 41.62 on the test sets. This result outperforms the previous baseline systems experimented on English-Afaan Oromo.

Save Icon
Up Arrow
Open/Close
Setting-up Chat
Loading Interface