Research on Bilingual Corpus Based Machine Translation

Shuang Wang

doi:10.4028/www.scientific.net/amm.687-691.1683

Abstract

This thesis proposes several methods for bilingual corpus form different websites, such as Automatic acquisition of bilingual corpus base on "iciba" web, CNKI and Patent network. It introduced methods, procedures of the acquisition of a variety of corpus. We proposed different methods to obtain the bilingual corpus for different characteristics of different sites, and achieved fast and accurate automatic access of a large-scale bilingual corpus. When we obtain the bilingual corpus based on "iciba" web, the main method is Nutch crawler, which is relatively good, and has an accurate retrieve and a good correlation. In addition, we give up the idea of bilingual corpus obtained from the entire Internet, but we use an entirely new access, that is to access to the basic information of scholarly thesis’s in the CNKI to obtain the large-scale high-quality English-Chinese bilingual corpus. We obtain GB level of large-scale bilingual aligned corpus in the end, which is very accurate by the manual evaluation. And the corpus makes preparation for the further cross-language information retrieval research.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Research on Bilingual Corpus Based Machine Translation

Abstract

Talk to us

Similar Papers

More From: Applied Mechanics and Materials

Lead the way for us

Similar Papers

Construction of Large-Scale Chinese-English Bilingual Corpus and Sentence Alignment
Sun Jie
-
Sun JieSun Jie
01 Jan 2023
01 Jan 2023

An Approach to Construct a Named Entity Annotated English-Vietnamese Bilingual Corpus
Long H B Nguyen ... Phuoc Tran
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 16
Long H B Nguyen, et. al.Long H B Nguyen ... Phuoc Tran
14 Oct 2016
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 16

Construction of a large-scale Sino-Vietnamese bilingual parallel corpus
Lin Luo ... Yuan-Yuan Mo
-
Lin Luo, et. al. Lin Luo ... Yuan-Yuan Mo
01 Jul 2014
01 Jul 2014

Automatic Acquisition of Large-Scale Academic Bilingual Parallel Corpus from the Web
Han Yong ... Li Yu
-
Han Yong, et. al.Han Yong ... Li Yu
01 Dec 2009
01 Dec 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Research on Bilingual Corpus Based Machine Translation

Abstract

Talk to us

Similar Papers

More From: Applied Mechanics and Materials