Abstract
When PRC was founded on mainland China and the KMT retreated to Taiwan in 1949, the relation between mainland China and Taiwan became a classical Cold War instance. Neither travel, visit, nor correspondences were allowed between the people until 1987, when government on both sides started to allow small number of Taiwan people with relatives in China to return to visit through a third location. Although the thawing eventually lead to frequent exchanges, direct travel links, and close commercial ties between Taiwan and mainland China today, 38 years of total isolation from each other did allow the language use to develop into different varieties, which have become a popular topic for mainly lexical studies (e.g., Xu, 1995; Zeng, 1995; Wang & Li, 1996). Grammatical difference of these two variants, however, was not well studied beyond anecdotal observation, partly because the near identity of their grammatical systems. This paper focuses on light verb variations in Mainland and Taiwan variants and finds that the light verbs of these two variants indeed show distributional tendencies. Light verbs are chosen for two reasons: first, they are semantically bleached hence more susceptible to changes and variations. Second, the classification of light verbs is a challenging topic in NLP. We hope our study will contribute to the study of light verbs in Chinese in general. The data adopted for this study was a comparable corpus extracted from Chinese Gigaword Corpus and manually annotated with contextual features that may contribute to light verb variations. A multivariate analysis was conducted to show that for each light verb there is at least one context where the two variants show differences in tendencies (usually the presence/absence of a tendency rather than contrasting tendencies) and can be differentiated. In addition, we carried out a K-Means clustering analysis for the variations and the results are consistent with the multivariate analysis, i.e. the light verbs in Mainland and Taiwan indeed have variations and the variations can be successfully differentiated.
Highlights
IntroductionDichotomy of language and dialect is not maintained in the context of Chinese language(s)
Language Variations in the Chinese ContextCommonly dichotomy of language and dialect is not maintained in the context of Chinese language(s)
The data for this study was extracted from the Annotated Chinese Gigaword Corpus (Huang, 2009) maintained by LDC which contains over 1.1 billion Chinese words, consisting of 700 million characters from Taiwan Central News Agency (CNA) and 400 million characters from Mainland Xinhua News Agency (XNA)
Summary
Dichotomy of language and dialect is not maintained in the context of Chinese language(s). Min, Hakka, and Wu are traditionally referred to as dialects of Chinese but are mutually unintelligible. They do share a common writing system and literary and textual tradition, which allows speakers to have a shared linguistic identity. To overcome the mutual unintelligibility problem, a variant of Northern Mandarin Chinese, is designated as the common language about a hundred years ago (called 普通話 Putonghua ‘common language’ in Mainland China, and 國 語 Guoyu ‘national language’ in Taiwan). Referred to as Mandarin or Mandarin Chinese, or Chinese nowadays, this is the one of the most commonly learned first or second languages in the world now. Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects, pages 1–10, Dublin, Ireland, August 23 2014
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.