Construction of Large-Scale Chinese-English Bilingual Corpus and Sentence Alignment

Sun Jie

doi:10.1007/978-3-031-23947-2_42

Abstract

AbstractWith the development of computer and Internet, applications based on bilingual (or multilingual) parallel corpora are increasing in the field of natural language processing. In addition to the application of machine translation, the construction of parallel corpus is also of great value for bilingual dictionary compilation, word meaning disambiguation and cross language information retrieval. At present, the bilingual corpus of word alignment and sentence alignment has a large scale, and the related alignment algorithms are also relatively mature. In contrast, the chunk level alignment algorithm remains to be studied, and the chunk level alignment corpus required by the alignment algorithm is quite lacking. The construction of bilingual corpus and its automatic alignment are of great significance to the development of computational linguistics. At present, the existing bilingual corpora at home and abroad, especially Chinese-English bilingual corpora, are not large, the processing standards are not unified, and there is no general bilingual corpus that can be used publicly. It has laid a solid foundation for the large-scale establishment of bilingual language information and knowledge base with unified standards and norms, multi fields, multi genres and sentence level alignment.KeywordsNon restricted areasBilingual corpusSentence alignment

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Construction of Large-Scale Chinese-English Bilingual Corpus and Sentence Alignment

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Construction of Parallel Corpus of Foreign Publicity Based on Computer-Aided Translation Software
Meng Sun
-
Meng SunMeng Sun
10 Dec 2021
10 Dec 2021

Research on Alignment in the Construction of Parallel Corpus
Zhaorong Zong ... Changchun Hong
Journal of Physics: Conference Series | VOL. 1213
Zhaorong Zong, et. al.Zhaorong Zong ... Changchun Hong
01 Jun 2019
Journal of Physics: Conference Series | VOL. 1213

Korean-Chinese Bilingual Sentence Alignment Method Based on Character Length
Qi Wang ... Yahui Zhao
-
Qi Wang, et. al.Qi Wang ... Yahui Zhao
18 Dec 2020
18 Dec 2020

An Approach to Construct a Named Entity Annotated English-Vietnamese Bilingual Corpus
Long H B Nguyen ... Dien Dinh
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 16
Long H B Nguyen, et. al.Long H B Nguyen ... Dien Dinh
14 Oct 2016
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Construction of Large-Scale Chinese-English Bilingual Corpus and Sentence Alignment

Abstract

Talk to us

Similar Papers