Optimizing Statistical Machine Translation for Text Simplification

Wei Xu,Chris Callison-Burch,Quanze Chen,Courtney Napoles,Ellie Pavlick

doi:10.1162/tacl_a_00107

Abstract

Most recent sentence simplification systems use basic machine translation models to learn lexical and syntactic paraphrases from a manually simplified parallel corpus. These methods are limited by the quality and quantity of manually simplified corpora, which are expensive to build. In this paper, we conduct an in-depth adaptation of statistical machine translation to perform text simplification, taking advantage of large-scale paraphrases learned from bilingual texts and a small amount of manual simplifications with multiple references. Our work is the first to design automatic metrics that are effective for tuning and evaluating simplification systems, which will facilitate iterative development for this task.

Highlights

The goal of text simplification is to rewrite an input text so that the output is more readable
Xu et al (2015) laid out a series of problems that are present in current text simplification research, and argued that we should deviate from the previous state-of-the-art benchmarking setup
We propose two new light-weight metrics instead: FKBLEU that explicitly measures readability and SARI that implicitly measures it by comparing against the input and references

Summary

Introduction

The goal of text simplification is to rewrite an input text so that the output is more readable. While sentence splitting (Siddharthan, 2006; Petersen and Ostendorf, 2007; Narayan and Gardent, 2014; Angrosh et al, 2014) and deletion (Knight and Marcu 2002; Clarke and Lapata 2006; Filippova and Strube 2008; Filippova et al 2015; Rush et al 2015; and others) have been intensively studied, there has been considerably less research on developing new paraphrasing models for text simplification — most previous work has used off-the-shelf statistical machine translation (SMT) technology and achieved reasonable results (Coster and Kauchak, 2011a,b; Wubben et al, 2012; Stajner et al, 2015). Our work is primarily focused on lexical simplification (rewriting words or phrases with simpler versions), and to a lesser extent on syntactic rewrite rules that simplify the input It largely ignores the important subtasks of sentence splitting and deletion. Our focus on lexical simplification does not affect the generality of the presented work, since deletion or sentence splitting could be applied as pre- or post-processing steps

Background

Adapting Machine Translation for Simplification

Incorporating Large-Scale Paraphrase Rules

Simplification-specific Features for Paraphrase Rules

Creating Multiple References

Tuning Parameters

Experiments and Analyses

Qualitative Analysis

Quantitative Evaluation of Simplification Systems

Correlation of Automatic Metrics with Human Judgments

Conclusions and Future Work

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Transactions of the Association for Computational Linguistics	Publication Date: Dec 1, 2016
Citations: 344	License type: cc-by

R Discovery Prime

R Discovery Prime

Optimizing Statistical Machine Translation for Text Simplification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics

Lead the way for us

Similar Papers

A Survey on Evaluation Metrics for Machine Translation
Seungjun Lee ... Jaehyung Seo
Mathematics | VOL. 11
Seungjun Lee, et. al.Seungjun Lee ... Jaehyung Seo
16 Feb 2023
Mathematics | VOL. 11

Human Versus Automatic Evaluation of NMT for Low-Resource Indian Language
Goutam Datta ... Nisheeth Joshi
-
Goutam Datta, et. al.Goutam Datta ... Nisheeth Joshi
01 Jan 2023
01 Jan 2023

Corpus Creation and Initial SMT Experiments between Spanish and Shipibo-konibo
Ana-Paula Galarreta ... Andrés Melgar
-
Ana-Paula Galarreta, et. al.Ana-Paula Galarreta ... Andrés Melgar
10 Nov 2017
10 Nov 2017

Machine Translation: Translated Texts in Terms of Standards of Textuality
Audronė Daubarienė ... Greta Ziezytė
Studies About Languages | VOL. 0
Audronė Daubarienė, et. al.Audronė Daubarienė ... Greta Ziezytė
29 Jun 2013
Studies About Languages | VOL. 0

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimizing Statistical Machine Translation for Text Simplification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics