Abstract

This paper applies a quantitative model developed for measuring grammatical status, using data from the Lancaster Corpus of Mandarin Chinese (lcmc). The model takes into account four quantitative factors (token frequency, collocate diversity, colligate diversity and deviation of proportions) and uses them as predictors in a binary logistic regression in order to compute a score of grammatical status between ‘0’ (lexical/non-grammatical) and ‘1’ (highly grammatical) for each given element. The results of the lcmc model are then compared to those of a similar study of the British National Corpus (bnc). The comparison suggests that token frequency emerges as one of the most relevant parameters for quantifying degrees of grammatical status in both language models, together with the collocate diversity measure when using a broad window span. On the other hand, the colligational measures (left- or right-based) and the other collocate diversity measures using small spans (left- or right-based) contribute very differently to the two languages due to their typologically distinctive structures.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.