Abstract
Stylistic analysis enables open-ended and exploratory observation of languages. To fill the gap in the quantitative analysis of the stylistic systems of Middle Chinese, we construct lexical features based on the evolutive core word usage and scheme a Bayesian method for feature parameters estimation. The lexical features are from the Swadesh list, each of which has different word forms along with the language evolution during the Middle Ages. We thus count the varied word of those entries along with the language evolution as the linguistic features. With the Bayesian formulation, the feature parameters are estimated to construct a high-dimensional random feature vector to obtain the pair-wise dissimilarity matrix of all the texts based on different distance measures. Finally, we perform the spectral embedding and clustering to visualize, categorize, and analyze the linguistic styles of Middle Chinese texts. The quantitative result agrees with the existing qualitative conclusions and, furthermore, betters our understanding of the linguistic styles of Middle Chinese from both the inter-category and intra-category aspects. It also helps unveil the special styles induced by the indirect language contact.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: ACM Transactions on Asian and Low-Resource Language Information Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.