Abstract

This work demonstrates how semi-supervised learning and human-in-the-loop crowdsourcing can help neural machine translation (NMT) challenges common in low-resource languages. We focus on the Mande language Bambara, which has approximately 16 million primary and secondary speakers in Western Africa. Bambara is mainly spoken as opposed to written language and it has few digital resources due to its history in regions where colonial French became the language of government and industry. Thus, Bambara is a "low-resource language" and because it lacks the existing language resources (parallel digital text and labeled data) necessary for NMT, we describe a novel crowdsourcing approach to support semi-supervised NMT. We designed a crowdsourcing platform that requests the annotator to supply information when the NMT model has decision confusion. Our crowdsourcing platform was tested on evaluating translations of Malian broadcast news and Wikipedia pages in Bambara. Our initial research shows a wide variation in the quality of the translations and further work includes a more rigorous evaluation of translator skills when onboarding new annotators.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.