Abstract

Parallel corpora for low-resource Arabic dialects and English are limited and small-scale, and most neural machine translation models, including Google Translate, rely mainly on parallel corpora of standard Arabic and English to train for dialectal Arabic translation. A model well trained to translate to and from standard Arabic is believed to efficiently translate dialectal Arabic, given their similarities. This study demonstrates the impact of not using large-scale, dialect-specific parallel corpora by quantitatively and qualitatively analyzing the performance of Google Translate in translating Egyptian Arabic adjuncts. Compared to human reference translation, Google Translate achieved a low BLEU score of 14.69. Qualitative analysis showed that reliance on standard Arabic parallel corpora caused a negative transfer problem manifested in the literal translation of idiomatic adjuncts, the misinterpretation of dialectal adjuncts as main clause constituents, the translation of dialectal adjuncts after orthographically similar standard Arabic words, and the use of standard Arabic common lexical meanings to translate dialect-specific adjuncts. This study’s findings will be relevant for researchers interested in dialectal Arabic neural machine translation and has implications for investment in the development of large-scale, dialect-specific corpora to better process the peculiarities of Arabic dialects and reduce the effect of negative transfer from standard Arabic.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.