Abstract
AbstractWith the development of computer and Internet, applications based on bilingual (or multilingual) parallel corpora are increasing in the field of natural language processing. In addition to the application of machine translation, the construction of parallel corpus is also of great value for bilingual dictionary compilation, word meaning disambiguation and cross language information retrieval. At present, the bilingual corpus of word alignment and sentence alignment has a large scale, and the related alignment algorithms are also relatively mature. In contrast, the chunk level alignment algorithm remains to be studied, and the chunk level alignment corpus required by the alignment algorithm is quite lacking. The construction of bilingual corpus and its automatic alignment are of great significance to the development of computational linguistics. At present, the existing bilingual corpora at home and abroad, especially Chinese-English bilingual corpora, are not large, the processing standards are not unified, and there is no general bilingual corpus that can be used publicly. It has laid a solid foundation for the large-scale establishment of bilingual language information and knowledge base with unified standards and norms, multi fields, multi genres and sentence level alignment.KeywordsNon restricted areasBilingual corpusSentence alignment
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.