Abstract

In this chapter, we have addressed some of the theoretical and practical issues relating to the generation, processing and management of a parallel translation corpus (PTC) with reference to some Indian languages. A PTC developed in a consortium-mode project under the aegis of DeitY, Govt. of India is discussed. Several issues relating to PTC development are discussed here for the first time keeping in mind the ready application of parallel translation corpora in various domains of computational linguistics and applied linguistics. In a normative manner, we have defined here what a PTC is, described the process of its construction, and have identified its primary features. These issues are brought under focus to justify the present work of trying to develop a PTC for Indian languages for future reference and application. Next, we have exemplified the processes of text alignment in a PTC; discussed the methods of text analysis; proposed the restructuring of translational units; defined the process of extraction of translational equivalents from a PTC; proposed the generation of a bilingual lexical database and termbank from a structured PTC; and finally have identified the areas where a PTC and information extracted from it may be utilized. Since the construction of PTC is full of hurdles, we have tried to construct a roadmap with a focus on techniques and methodologies that may be applied in order to achieve the task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call