Abstract

Many of the phrase pairs extracted in the phrase-based machine translation systems have low quality and are not relevant. So their existence in the phrase table not only enlarges it, but also could reduce the translation quality. There are many methods presented to prune these noisy phrase pairs, using the statistics derived from the phrase table. In this paper we proposed a new pruning method that unlike the other similar pruning approaches uses the content of each side of the phrase pair to estimate its relevance and quality. In order to model the content of phrases, the topic models have been used. With testing this new pruning method on a Farsi-English system we could prune more than 50% of the phrase-table without significant loss or even improvements in the BLEU scores.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.