This study uses a data-driven approach to mine the distribution of personality traits among Chinese people in the Chinese social context. Based on the hypothesis of personality lexicology, word embedding technology was employed in machine learning to mine personality vocabulary from Tencent's word embedding database. More than 10,000 Chinese personality descriptors were extracted and analyzed using Gaussian Mixture Model Cluster and Hierarchical clustering analysis. The data was collected from 658 Chinese people randomly from all parts of China through an online questionnaire method. The results reveal six personality traits in the Chinese context, expanding the personality thesaurus and providing examples to illustrate each trait. The findings coincide with previous research on the five-factor model, which partially describes the personality traits of Chinese people, but does not offer a complete explanation of their typical social behavior patterns. Additionally, the study supports the notion of cultural particularity in personality traits. The approach used in this study offers a richer personality vocabulary than traditional personality mining methods, and word embedding technology captures richer semantic information in Chinese. The six Chinese personality traits identified in this study will also be used to explore how to quantify and evaluate personality traits based on word embedding and personality descriptors.
Read full abstract