Abstract

With the wide application of Internet in almost all fields, it has become the most important way for information publication, providing a large number of channels for spreading public opinion. Public opinions, as the response of Internet users to the information such as social events and government policies, reflect the status of both society and economics, which is highly valuable for the decision-making and public relations of enterprises. At present, the analysis methods for Internet public opinion are mainly based on discriminative approaches, such as Support Vector Machine (SVM) and neural network. However, when these approaches analyze the sentiment of Internet public opinion, they are failed to exploit information hidden in text, e.g. topic. Motivated by the above observation, this paper proposes a detection method for public sentiment based on Probabilistic Latent Semantic Analysis (PLSA) model. PLSA inherits the advantages of LSA, exploiting the semantic topic hidden in data. The procedure of detecting the public sentiment using this algorithm is composed of three main steps: (1) Chinese word segmentation and word refinement, with which each document is represented by a bag of words; (2) modeling the probabilistic distribution of documents using PLSA; (3) using the Z-vector of PLSA as the features of documents and delivering it to SVM for sentiment detection. We collect a set of text data from Weibo, blog, BBS etc. to evaluate our proposed approach. The experimental results shows that the proposed method in this paper can detect the public sentiment with high accuracy, outperforming the state-of-the-art approaches, i.e., word histogram based approach. The results also suggest that, text semantic analysis using PLSA could significantly boost the sentiment detection.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.