Measuring grammatical status in Chinese through quantitative corpus analysis

Linlin Sun,David Correia Saavedra

doi:10.3366/cor.2020.0202

Abstract

This paper applies a quantitative model developed for measuring grammatical status, using data from the Lancaster Corpus of Mandarin Chinese (lcmc). The model takes into account four quantitative factors (token frequency, collocate diversity, colligate diversity and deviation of proportions) and uses them as predictors in a binary logistic regression in order to compute a score of grammatical status between ‘0’ (lexical/non-grammatical) and ‘1’ (highly grammatical) for each given element. The results of the lcmc model are then compared to those of a similar study of the British National Corpus (bnc). The comparison suggests that token frequency emerges as one of the most relevant parameters for quantifying degrees of grammatical status in both language models, together with the collocate diversity measure when using a broad window span. On the other hand, the colligational measures (left- or right-based) and the other collocate diversity measures using small spans (left- or right-based) contribute very differently to the two languages due to their typologically distinctive structures.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Measuring grammatical status in Chinese through quantitative corpus analysis

Abstract

Talk to us

Similar Papers

More From: Corpora

Lead the way for us

Similar Papers

Weighted Sampling for Masked Language Modeling
Linhan Zhang ... Wen Wang
-
Linhan Zhang, et. al.Linhan Zhang ... Wen Wang
04 Jun 2023
04 Jun 2023

워드 임베딩과 품사 태깅을 이용한 클래스 언어모델 연구
Euisok Chung ... Jeon-Gue Park
KIISE Transactions on Computing Practices | VOL. 22
Euisok Chung, et. al.Euisok Chung ... Jeon-Gue Park
15 Jul 2016
KIISE Transactions on Computing Practices | VOL. 22

Comparison of Error Rate Prediction Methods in Binary Logistic Regression Model for Balanced Data
Shavira Asysyifa S ... Nonong Amalita
UNP Journal of Statistics and Data Science | VOL. 1
Shavira Asysyifa S, et. al. Shavira Asysyifa S ... Nonong Amalita
28 Aug 2023
UNP Journal of Statistics and Data Science | VOL. 1

Trans-dimensional Random Fields for Language Modeling
Bin Wang ... Zhijian Ou
-
Bin Wang, et. al.Bin Wang ... Zhijian Ou
01 Jan 2015
01 Jan 2015

Journal: Corpora	Publication Date: Nov 1, 2020
Citations: 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Measuring grammatical status in Chinese through quantitative corpus analysis

Abstract

Talk to us

Similar Papers

More From: Corpora