Abstract
Using a novel rule labeling method, this article proposes a hierarchical model for statistical machine translation. The proposed model labels translation rules by matching the boundaries of target side phrases with the shallow syntactic labels including POS tags and chunk labels on the target side of the training corpus. The boundary labels are concatenated if there is no label for the whole target span. Labeling with the classes of boundary words on the target side phrases has been previously proposed as a phrase-boundary model which can be considered as the base form of our model. In the extended model, the labeler uses a POS tag if there is no chunk label in one boundary. Using chunks as phrase labels, the proposed model generalizes the rules to decrease the model sparseness. The sparseness is a more important issue in the language pairs with a lot of differences in the word order because they have less number of aligned phrase pairs for extraction of rules. The extended phrase-boundary model is also applicable for low-resource languages having no syntactic parser. Some experiments are performed with the proposed model, the base phrase-boundary model, and variants of Syntax Augmented Machine Translation (SAMT) in translation from Persian and German to English as source and target languages with different word orders. According to the results, the proposed model improves the translation performance in the quality and decoding time aspects. Using BLEU as our metric, the proposed model has achieved a statistically significant improvement of about 0.5 point over the base phrase-boundary model.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: ACM Transactions on Asian and Low-Resource Language Information Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.