Abstract

In this report we present a neural approach to machine translation for the Val Badia variant of Ladin. To achieve good results, neural models require a large number of exemplary translations on which they can be trained. The limited availability of such parallel data for Ladin makes it necessary to synthesise such data by using monolingual texts. We mainly use texts from the Ladin newspaper “La Usc di Ladins” as a basis for this so-called back translation. We translate these texts into Italian, using a rule-based system implemented in Apertium. Using DeepL API, we postprocess these translations and improve them, mainly at grammatical level. The resulting corpus serves as a basis for the different experiments we perform when we train models for this language pair. We train Transformer models from scratch and fine-tune pretrained models. With both methods we have achieved results that outperform the statistical and rule-based approaches to machine translation investigated so far. The models have been made available by means of a web application. Furthermore we have launched a platform for the continuous revision of the corpus to allow for the continuous improvement of the model through human post-editing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call