With the rapid development of Internet technology, the network has become an indispensable way of life for undergraduates. The correct guidance of public opinion has also become an important thing in the ideological work of universities. Undergraduates are in an important period of formation and development of thoughts that they are easily to be incited by cyber-rumors. Therefore, it is particularly important to obtain the data of political public opinion in universities and position the hot topics for early detection of political public opinion tendency, which can also avoid the outbreak of major security incidents. With such consideration, this paper obtains multi-source political public opinion data from BBS, Tieba and Weibo of SUN YAT-SEN UNIVERSITY (SYSU) through crawler. We study a text feature extraction method based on Word2Vec & LDA (Latent Dirichlet Allocation), which improves the high-dimensional sparsity in traditional Vector Space Model (VSM) text representation. Meanwhile, based on the classical Single-pass clustering algorithm, this paper studies the Single-pass & HAC clustering algorithm. In addition, a measurement method of hot topic is defined to calculate the heat value of political public opinion. Dictionary and rule based method is used to improve the accuracy of sentiment tendency analysis. The experimental results demonstrate that the effect of topic detection and positioning based on LDA & Word2Vec and Single-pass & HAC algorithm is better than other methods.
Read full abstract