Research on Alignment in the Construction of Parallel Corpus

Zhaorong Zong,Changchun Hong

doi:10.1088/1742-6596/1213/4/042003

Zhaorong Zong, Changchun Hong

Open Access

https://doi.org/10.1088/1742-6596/1213/4/042003

Copy DOI

Journal: Journal of Physics: Conference Series	Publication Date: Jun 1, 2019
Citations: 2	License type: cc-by

Affiliation: Huangshan University

Abstract

Parallel corpora are of great value in the field of machine translation and cross-language information retrieval. Benefiting from the development of machine learning and deep learning, the technology of the construction of corpus evolves from vocabulary alignment, phrase alignment to chunk alignment. The high quality of automatic bilingual chunks alignment in corpus plays an important role in the performance improvement of machine translation systems, especially in computer-aided translation systems. In the study, the degree of adhesion and relaxation is used to measure the tightness and looseness of the inter-word connection when a chunk is identified, which can be expressed by a mathematical mode. The task of chunk alignment in the construction of a parallel corpus can be described as the three steps: input bilingual sentences, segment chunks, and semantic alignment. At present, most algorithms are based on statistical methods, and the output alignment results are machine-oriented.

Full Text