Abstract
The goal of this work is to develop a text and speech translation system from Spanish to Basque. This pair of languages shows quite odd characteristics as they differ extraordinarily in both morphology and syntax, thus, attractive challenges in machine translation are involved. Nevertheless, since both languages share official status in the Basque Country, the underlying motivation is not only academic but also practical. Finite-state transducers were adopted as basic translation models. The main contribution of this work involves the study of several techniques to improve probabilistic finite-state transducers by means of additional linguistic knowledge. Two methods to cope with both linguistics and statistics were proposed. The first one performed a morphological analysis in an attempt to benefit from atomic meaningful units when it comes to rendering the meaning from one language to the other. The second approach aimed at clustering words according to their syntactic role and used such phrases as translation unit. From the latter approach phrase-based finite-state transducers arose as a natural extension of classical ones. The models were assessed under a restricted domain task, very repetitive and with a small vocabulary. Experimental results shown that both morphological and syntactical approaches outperformed the baseline under different test sets and architectures for speech translation.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.