Linguistic Annotation of Translated Chinese Texts: Coordinating Theory, Algorithms and Data

Kirill I. Semenov,Yulia N. Kuznetsova,Alexandra S. Konovalova,Alena D. Tsvetkova,Elena A. Volf,Yulia O. Korotkova,Aleksandra O. Piskunova,Armine K. Titizian

doi:10.2478/jazcas-2021-0054

Linguistic Annotation of Translated Chinese Texts: Coordinating Theory, Algorithms and Data

Kirill I. Semenov, Yulia N. Kuznetsova + Show 6 more

Open Access

https://doi.org/10.2478/jazcas-2021-0054

Copy DOI

Journal: Journal of Linguistics/Jazykovedný casopis	Publication Date: Dec 1, 2021
License type: CC BY-NC-ND 3.0

Affiliation: Charles University, Lomonosov Moscow State University, National Research University Higher School of Economics

#Chinese Texts #Linguistic Annotation + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Abstract The article tackles the problems of linguistic annotation in the Chinese texts presented in the Ruzhcorp – Russian-Chinese Parallel Corpus of RNC, and the ways to solve them. Particular attention is paid to the processing of Russian loanwords. On the one hand, we present the theoretical comparison of the widespread standards of Chinese text processing. On the other hand, we describe our experiments in three fields: word segmentation, grapheme-to-phoneme conversion, and PoS-tagging, on the specific corpus data that contains many transliterations and loanwords. As a result, we propose the preprocessing pipeline of the Chinese texts, that will be implemented in Ruzhcorp.

Full Text