Abstract

To tackle the problems of term extraction in language specific field, this paper proposes a method of coordinating use of corpus and machine translation system in extracting terms in LSP text. A comparable corpus built for this research contains 167 English texts and 229 Chinese texts with around 600,000 English tokens and 900,000 Chinese characters. The corpus is annotated with mega-information and tagged with POS for further use. To get the key word list from the corpus, BFSU PowerConc software is used with the referential corpora of Crown and CLOB for English and TORCH and LCMC for Chinese. A VB program is written to generate the multi-word units, and then GOOGLE translators’ toolkit is used to get translation pairs and SDL trados fuzzy match function is applied to extract lists of multi-word terms and their translations. The results show this method has 70% of translated term pairs scoring 2.0 in a 0~3 grading scale with a 0.5 interval by human graders. The methods can be applied to extract translation term pairs for computer-aided translation of language for specific purpose texts. Also, the by-product comparable corpus, combined with N-gram multiword unit lists, can be used in facilitating trainee translators in translation. The findings underline the significance of combing the use of machine translation method with corpora techniques, and also foresee the necessity of comparable corpora building and sharing and Conc-gram extracting in this field.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call