Research on Weibo New Word Recognition based on Weibo Data and Statistical Information

Yuanfang Xu

doi:10.54097/fcis.v5i2.13147

Abstract

One of the key challenges in the field of Chinese information processing is the recognition of Weibo new words, which has a profound impact on machine translation and text classification. As Weibo has become the most used social platform for internet users, mining new vocabulary from Weibo data not only helps to deeply understand the data itself, but also provides personalized recommendation services for users. Although a large amount of research has focused on the recognition of Weibo new words, specialized research in this field is still scarce. In this article, we propose a Weibo new word recognition strategy that combines Weibo content features and statistical information. Firstly, extract repetitive vocabulary from Weibo topic names, and then use various methods such as absolute frequency, relative frequency, mutual information, and information entropy to filter for incorrect vocabulary. The experimental results show that by setting appropriate thresholds, incorrect vocabulary can be effectively filtered out, thereby improving recognition performance.

Full Text