Parallel Phrase Scoring for Extra-large Corpora

Mohammed Mediani,Alex Waibel,Jan Niehues

doi:10.2478/v10108-012-0011-z

Parallel Phrase Scoring for Extra-large Corpora

Mohammed Mediani, Alex Waibel + Show 1 more

Open Access

https://doi.org/10.2478/v10108-012-0011-z

Copy DOI

Journal: Prague Bulletin of Mathematical Linguistics	Publication Date: Jan 1, 2012
Citations: 8

#Terms Of Bleu Score #Phrase-based Systems + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

This paper presents a C++ implementation of the phrase scoring step in phrase-based systems that helps to exploit the available computing resources more efficiently and trains very large systems in reasonable time without sacrificing the system’s performance in terms of Bleu score. Three parallelizing tools are made freely available. The first exploits shared memory parallelism and multiple disks for parallel IOs while the two others run in a distributed environment. We demonstrate the efficiency and consistency of our tools, in the framework of the Fr-En systems we developed for the WMT and IWSLT evaluation campaigns, in which we were able to generate the phrase table in one third up to one seventh of the time taken by Moses in the same tasks.

Full Text