Abstract

Machine translation (MT) draws from several different disciplines, making it a complex subject to teach. There are excellent pedagogical texts, but problems in MT and current algorithms for solving them are best learned by doing. As a centerpiece of our MT course, we devised a series of open-ended challenges for students in which the goal was to improve performance on carefully constrained instances of four key MT tasks: alignment, decoding, evaluation, and reranking. Students brought a diverse set of techniques to the problems, including some novel solutions which performed remarkably well. A surprising and exciting outcome was that student solutions or their combinations fared competitively on some tasks, demonstrating that even newcomers to the field can help improve the state-of-the-art on hard NLP problems while simultaneously learning a great deal. The problems, baseline code, and results are freely available.

Highlights

  • A decade ago, students interested in natural language processing arrived at universities having been exposed to the idea of machine translation (MT) primarily through science fiction

  • We provided three simple Python programs: evaluate implements a simple ranking of the systems based on position-independent word error rate (PER; Tillmann et al, 1997), which computes a bagof-words overlap between the system translations and the reference

  • The best submission, obtaining a correlation of 83.5, relied on the idea that the reference and machine translation should be good paraphrases of each other (Owczarzak et al, 2006; Kauchak and Barzilay, 2006). It employed a simple paraphrase system trained on the alignment challenge data, using the pivot technique of Bannard and CallisonBurch (2005), and computing the optimal alignment between machine translation and reference under a simple model in which words could align if they were paraphrases

Read more

Summary

Introduction

A decade ago, students interested in natural language processing arrived at universities having been exposed to the idea of machine translation (MT) primarily through science fiction. Today, incoming students have been exposed to services like Google Translate since they were in secondary school or earlier. It makes sense to teach statistical MT, either on its own or as a unit in a class on natural language processing (NLP), machine learning (ML), or artificial intelligence (AI). A course that promises to show students how Google Translate works and teach them how to build something like it is especially appealing, and several universities and summer schools offer such classes. There are excellent introductory texts—depending on the level of detail required, instructors can choose from a comprehensive MT textbook (Koehn, 2010), a chapter of a popular NLP textbook (Jurafsky and Martin, 2009), a tutorial survey (Lopez, 2008), or an intuitive tutorial on the IBM Models (Knight, 1999b), among many others

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call