Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation

Helena M Caseli,Maria Das Graças V Nunes,Mikel L Forcada

doi:10.1007/s10590-007-9027-9

Helena M Caseli, Maria Das Graças V Nunes + Show 1 more

https://doi.org/10.1007/s10590-007-9027-9

Copy DOI

Abstract

The availability of machine-readable bilingual linguistic resources is crucial not only for rule-based machine translation but also for other applications such as cross-lingual information retrieval. However, the building of such resources (bilingual single-word and multi-word correspondences, translation rules) demands extensive manual work, and, as a consequence, bilingual resources are usually more difficult to find than "shallow" monolingual resources such as morphological dictionaries or part-of-speech taggers, especially when they involve a less-resourced language. This paper describes a methodology to build automatically both bilingual dictionaries and shallow-transfer rules by extracting knowledge from word-aligned parallel corpora processed with shallow monolingual resources (morphological analysers, and part-of-speech taggers). We present experiments for Brazilian Portuguese---Spanish and Brazilian Portuguese---English parallel texts. The results show that the proposed methodology can enable the rapid creation of valuable computational resources (bilingual dictionaries and shallow-transfer rules) for machine translation and other natural language processing tasks).

Full Text