Abstract

Most recent sentence simplification systems use basic machine translation models to learn lexical and syntactic paraphrases from a manually simplified parallel corpus. These methods are limited by the quality and quantity of manually simplified corpora, which are expensive to build. In this paper, we conduct an in-depth adaptation of statistical machine translation to perform text simplification, taking advantage of large-scale paraphrases learned from bilingual texts and a small amount of manual simplifications with multiple references. Our work is the first to design automatic metrics that are effective for tuning and evaluating simplification systems, which will facilitate iterative development for this task.

Highlights

  • The goal of text simplification is to rewrite an input text so that the output is more readable

  • Xu et al (2015) laid out a series of problems that are present in current text simplification research, and argued that we should deviate from the previous state-of-the-art benchmarking setup

  • We propose two new light-weight metrics instead: FKBLEU that explicitly measures readability and SARI that implicitly measures it by comparing against the input and references

Read more

Summary

Introduction

The goal of text simplification is to rewrite an input text so that the output is more readable. While sentence splitting (Siddharthan, 2006; Petersen and Ostendorf, 2007; Narayan and Gardent, 2014; Angrosh et al, 2014) and deletion (Knight and Marcu 2002; Clarke and Lapata 2006; Filippova and Strube 2008; Filippova et al 2015; Rush et al 2015; and others) have been intensively studied, there has been considerably less research on developing new paraphrasing models for text simplification — most previous work has used off-the-shelf statistical machine translation (SMT) technology and achieved reasonable results (Coster and Kauchak, 2011a,b; Wubben et al, 2012; Stajner et al, 2015). Our work is primarily focused on lexical simplification (rewriting words or phrases with simpler versions), and to a lesser extent on syntactic rewrite rules that simplify the input It largely ignores the important subtasks of sentence splitting and deletion. Our focus on lexical simplification does not affect the generality of the presented work, since deletion or sentence splitting could be applied as pre- or post-processing steps

Background
Adapting Machine Translation for Simplification
Incorporating Large-Scale Paraphrase Rules
Simplification-specific Features for Paraphrase Rules
Creating Multiple References
Tuning Parameters
Experiments and Analyses
Qualitative Analysis
Quantitative Evaluation of Simplification Systems
Correlation of Automatic Metrics with Human Judgments
Conclusions and Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.