Abstract

This research aims to investigate the topic of historical semantic change from the perspective of quantitative and computational linguistics. With a rapid accumulation of texts in the digital era, attention is called upon a more temporal­-aware interpretation of language use and meaning construction. Meanwhile, the digitalization of historical texts opens up more research opportunities to trace the diachronic development of words and meanings. Especially, semantic change motivated by linguistic features and factors can be explored in a data­-driven approach. Language is a means of communication through which ideas are conveyed, stored, and recorded, and in essence, constant change and evolution occurs as the speakers use the language with the passage of time (Blank, 1999: 61). The dynamics of meaning construction is embodied in the emergence and loss of senses, as well as the split and shifts, which contributes to the different distributions and interactions of words, reflects the regularities and adaptability of the language, and the cognition and culture operating behind (Blank, 1999: 63). Synchronic variations can be dealt with through a diachronic lens. Corpus­-based, data-­driven approach enables an observation and derived generalizations of semantic change. Coupled with the advances in vector space models and statistical analysis, the changes in meaning are explored. Polysemy is a driving force of semantic change. Concepts and meanings are structured in words and language use, and how word­-formation is realized in Chinese is addressed in the development of monosyllabic to disyllabic words, which not only allows us to explore the influence of homophony, the interaction between words, and the growth of disyllabic words and compounds. Seeing that historical textual data are in demand, computational semantics and statistical models resolves the dilemmas. On top of that, it is possible that semantic change occurs not in observed frequency, but other distributional ways, making the encoded meanings distinctively different from previous time periods. As distributed models like word embeddings are receiving much attention, historical semantic change is a research topic that should enter the discussions. In the field of corpus linguistics, such research method are based on co­-occurrences of words in context, and the co­occurrence distribution represents the similarities and differences in meaning interactions. The diachronic corpus consists of texts from the following sources: the Chinese Text Project (Sturgeon, 2019) and Academia Sinica Balanced Corpus of Modern Chinese for modern Chinese (Chen et al., 1996). By applying a quantitative inquiry into semantic change, we will measure the degrees of semantic change, support known change cases, and discover unknown ones, with the consultation of lexical databases. Firstly, the global measures proposed by Hamilton et al. (2016a) is adopted. Second­-order embeddings comprised of similarity scores of keywords are formed to compare the meaning representations of different eras. The lower the correlation between two temporally­-adjacent vectors, the higher the degrees of semantic change. Secondly, based on the distribution and interaction of a word’s senses, the semantic trajectories of the word will be traced. Finally, this study will proceed with periodization analysis using the Variability­-based Neighbor Clustering (VNC) method (Gries and Hilpert, 2012). As a hierarchical clustering method, it is bottom­-up, as opposite to the decisive clustering, a comprehensive evaluation of the influence of the selected linguistic factors in this study is implemented to explore how the development of meaning construction can be understood under different stages. In sum, this study explores the phenomenon of semantic change in retrospect to derive the semantic development in diachrony. The computational/statistical modeling of historical lexical semantic change will shed new light on how the language community describes and makes sense of the society that is also constantly changing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call