Abstract

Parallel corpora are of great value in the field of machine translation and cross-language information retrieval. Benefiting from the development of machine learning and deep learning, the technology of the construction of corpus evolves from vocabulary alignment, phrase alignment to chunk alignment. The high quality of automatic bilingual chunks alignment in corpus plays an important role in the performance improvement of machine translation systems, especially in computer-aided translation systems. In the study, the degree of adhesion and relaxation is used to measure the tightness and looseness of the inter-word connection when a chunk is identified, which can be expressed by a mathematical mode. The task of chunk alignment in the construction of a parallel corpus can be described as the three steps: input bilingual sentences, segment chunks, and semantic alignment. At present, most algorithms are based on statistical methods, and the output alignment results are machine-oriented.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call