The Corpus of Emotional Valences for 33,669 Chinese Words Based on Big Data

Chia-Yueh Chang,Shu-Ling Cho,Yu-Lin Chang,Meng-Ning Tsai,Shu-Yen Lin,Tao-Hsing Chang,Hsueh-Chih Chen,Yen-Cheng Chen,Yao-Ting Sung

doi:10.1007/978-3-031-05544-7_11

Abstract

AbstractEmotion theories are mainly classified as categorical or dimensional approaches. Given the importance of emotional words in emotion research, researchers have constructed a co-occurrence corpus of 7 types of emotion words through word co-occurrence and big data corpora. However, in addition to the categorical approach, the dimensional approach plays an important role in natural language processing. In particular, valence has an important influence on the study of emotion and language. In this study, the co-occurrence corpus of 7 types of emotion words constructed by Chen et al. [1] was expanded to create a corpus of emotional valences. Then, stepwise multiple regression analysis was performed with the predicted criterion variables and 15 predictor variables. The criterion variables were the emotional valences of 553 frequently occurring stimulus words included in the Chinese Word Association Norms [2]. The predictor variables included the emotion co-occurrences scores for 2 clusters (a cluster of literal emotion words and a cluster of metaphorical emotion words) and 7 types of emotions (happiness, love, surprise, sadness, anger, disgust, and fear) [the emotional words were common words from both the co-occurrence corpus of 7 types of emotion words constructed by Chen et al. [1] and the Chinese Word Association Norms established by Hu et al. [2]] and the virtue word co-occurrences score. The results showed that the scores for literal happiness word co-occurrences, metaphorical happiness word co-occurrences, literal disgust word co-occurrences, literal fear word co-occurrences, and virtue word co-occurrences could predict the valence values of emotion words, with the multiple correlation coefficients of multiple regression analyses reaching .729. Subsequently, the valence values of 33,669 words were established using the formula obtained from the multiple regression analysis of the 553 words. Next, the correlation between the actual valence values and the predicted valence values was analyzed to test the cross-validity of the established valences using the common words in the norm established by Lee and Lee [3] for the emotionality ratings and free associations of 267 common Chinese words. The results showed that the correlation between the 2 was .755, indicating that the predicted values generated by the big data corpora and word co-occurrence had a degree of similarity with the manually determined values. Based on theories and tests, this study used the co-occurrence data of 7 emotions and virtue to construct the corpus of emotional valences for 33,669 Chinese words. The results showed that the combined use of big data corpora and word co-occurrence can effectively expand existing corpora that were established based on emotional categories, improve the efficiency of manual construction of corpora, and establish a larger corpus of emotional words. KeywordsEmotionValenceWord co-occurrenceBig dataChinese

Full Text