Abstract

History shows that a machine translation (MT) system with the support of a few linguistic rules is not realistic. A few rules are not sufficient for capturing the wide variety a natural language exhibits in its diverse use. This leads us to argue for a corpus-based machine translation (CBMT) system that desires to rely on a large amount of linguistic data, information, examples, and rules retrieved from corpora. The first benefit of a CBMT system is the development of algorithms for alignment of bilingual text corpus (BTC)—an essential part of an MT system. A BTC generates a new kind of translation support resource that helps in learning through trial, verification, and validation. A CBMT system begins with analysis of translations produced by human to understand and define the internal structures of BTC, completely or partially, to design strategies for machine learning. Analysis of BTC lends heavily to develop aids to translation as we do not expect an MT system to ‘produce’ exact translation but to ‘understand’ how translations are actually produced with linguistic and extralinguistic information. The use of BTC in CBMT is justified on the ground that data and information acquired from BTC are richer than monolingual corpus with regard to information of contextual equivalence between the languages. Thus, a CBMT system earns a unique status by a combination of features of the example-based machine translation (EBMT) and statistics-based machine translation (SBMT) keeping a mutual interface between the two.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call