Abstract
The article at hand aggregates the work of our group in automatic processing of simplified German. We present four parallel (standard/simplified German) corpora compiled and curated by our group. We report on the creation of a gold standard of sentence alignments from the four sources for evaluating automatic alignment methods on this gold standard. We show that one of the alignment methods performs best on the majority of the data sources. We used two of our corpora as a basis for the first sentence-based neural machine translation (NMT) approach toward automatic simplification of German. In follow-up work, we extended our model to render it capable of explicitly operating on multiple levels of simplified German. We show that using source-side language level labels improves performance with regard to two evaluation metrics commonly applied to measuring the quality of automatic text simplification.
Highlights
Simplified language1 is a variety of standard language characterized by reduced lexical and syntactic complexity, the addition of explanations for difficult concepts, and clearly structured layout
This article has presented the work of our group in automatic processing of simplified German
We have given an overview of four parallel corpora compiled and curated by our group: the Web, Austria Presse Agentur (APA), Wikipedia, and capito corpora
Summary
Simplified language is a variety of standard language characterized by reduced lexical and syntactic complexity, the addition of explanations for difficult concepts, and clearly structured layout. As part of a rule-based approach, the operations carried out typically include replacing complex lexical and syntactic units with simpler ones (Chandrasekar et al, 1996; Siddharthan, 2002; Gasperin et al, 2010; Bott et al, 2012; Drndarevicand Saggion, 2012). A statistical approach (Specia, 2010; Zhu et al, 2010) generally conceptualizes the simplification task as one of converting a standard-language into a simplified-language text using machine translation techniques on a sentence level. The success of such approaches is contingent on the availability of high-quality sentence alignments
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.