Automatic Text Simplification for German

Sarah Ebling,Dominik Pfütze,Annette Rios,Alessia Battisti,Nicolas Spring,Andreas Säuberli,Marek Kostrzewa

doi:10.3389/fcomm.2022.706718

Sarah Ebling, Dominik Pfütze + Show 5 more

Open Access

https://doi.org/10.3389/fcomm.2022.706718

Copy DOI

Journal: Frontiers in Communication	Publication Date: Feb 23, 2022
Citations: 4	License type: CC BY 4.0

Affiliation: University of Zurich

Abstract

The article at hand aggregates the work of our group in automatic processing of simplified German. We present four parallel (standard/simplified German) corpora compiled and curated by our group. We report on the creation of a gold standard of sentence alignments from the four sources for evaluating automatic alignment methods on this gold standard. We show that one of the alignment methods performs best on the majority of the data sources. We used two of our corpora as a basis for the first sentence-based neural machine translation (NMT) approach toward automatic simplification of German. In follow-up work, we extended our model to render it capable of explicitly operating on multiple levels of simplified German. We show that using source-side language level labels improves performance with regard to two evaluation metrics commonly applied to measuring the quality of automatic text simplification.

Highlights

Simplified language1 is a variety of standard language characterized by reduced lexical and syntactic complexity, the addition of explanations for difficult concepts, and clearly structured layout
This article has presented the work of our group in automatic processing of simplified German
We have given an overview of four parallel corpora compiled and curated by our group: the Web, Austria Presse Agentur (APA), Wikipedia, and capito corpora

Summary

Introduction

Simplified language is a variety of standard language characterized by reduced lexical and syntactic complexity, the addition of explanations for difficult concepts, and clearly structured layout. As part of a rule-based approach, the operations carried out typically include replacing complex lexical and syntactic units with simpler ones (Chandrasekar et al, 1996; Siddharthan, 2002; Gasperin et al, 2010; Bott et al, 2012; Drndarevicand Saggion, 2012). A statistical approach (Specia, 2010; Zhu et al, 2010) generally conceptualizes the simplification task as one of converting a standard-language into a simplified-language text using machine translation techniques on a sentence level. The success of such approaches is contingent on the availability of high-quality sentence alignments

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatic Text Simplification for German

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Communication

Lead the way for us

Similar Papers

Automated Text Simplification as a Preprocessing Step for Machine Translation into an Under-resourced Language
Sanja Štajner ... Maja Popović
-
Sanja Štajner, et. al.Sanja Štajner ... Maja Popović
22 Oct 2019
22 Oct 2019

An in-depth analysis of the individual impact of controlled language rules on machine translation output: a mixed-methods approach
Shaimaa Marzouk
Machine Translation | VOL. 35
Shaimaa MarzoukShaimaa Marzouk
01 Jun 2021
Machine Translation | VOL. 35

From Feature to Paradigm: Deep Learning in Machine Translation (Extended Abstract)
Marta R Costa-Jussà
-
Marta R Costa-JussàMarta R Costa-Jussà
01 Jul 2018
01 Jul 2018

Case-Sensitive Neural Machine Translation
Xuewen Shi ... Yi-Kun Tang
-
Xuewen Shi, et. al.Xuewen Shi ... Yi-Kun Tang
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic Text Simplification for German

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Communication