Abstract

The risk classification of BBS posts is important to the evaluation of societal risk level within a period. Using the posts collected from Tianya forum as the data source, the authors adopted the societal risk indicators from socio psychology, and conduct document-level multiple societal risk classification of BBS posts. To effectively capture the semantics and word order of documents, a shallow neural network as Paragraph Vector is applied to realize the distributed vector representations of the posts in the vector space. Based on the document vectors, the authors apply one classification method KNN to identify the societal risk category of the posts. The experimental results reveal that paragraph vector in document-level societal risk classification achieves much faster training speed and at least 10% improvements of F-measures than Bag-of-Words. Furthermore, the performance of paragraph vector is also superior to edit distance and Lucene-based search method. The present work is the first attempt of combining document embedding method with socio psychology research results to public opinions area.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call