Abstract
A clustering method based on the Latent Dirichlet Allocation and the VSM model to compute the text similarity is presented. The Latent Dirichlet Allocation subject models and the VSM vector space model weights strategy are used respectively to calculate the text similarity. The linear combination of the two results is used to get the text similarity. Then the k-means clustering algorithm is chosen for cluster analysis. It can not only solve the deep semantic information leakage problems of traditional text clustering, but also solve the problem of the LDA that could not distinguish the texts because of too much dimension reduction. So the deep semantic information is mined from the text, and the clustering efficiency is improved. Through the comparisons with the traditional methods, the result shows that this algorithm can improve the performance of text clustering.
Highlights
In recent years, with the development of the information technology, the Internet has been widely available in the university, The number of the students who use Internet has increased considerably [1]
Student management departments should reinforce their work on online public opinion collection, research, and assessment, and attach importance to the control and guidance of the internet public sentiment
According to the requirement of analysis of network public opinions at colleges and universities, an online public opinion detection and analysis clustering method has built based on LDA (Latent Dirichlet allocation).This algorithm melts the subject models based on Latent Dirichlet Allocation and the VSM model based on TF-IDF weight to compute text similarity, and the cluster analysis is carried out
Summary
With the development of the information technology, the Internet has been widely available in the university, The number of the students who use Internet has increased considerably [1]. Student management departments should reinforce their work on online public opinion collection, research, and assessment, and attach importance to the control and guidance of the internet public sentiment. In Internet public opinion analysis, the student affairs administrators need some intelligent methods to find the exact information in the magnanimous information sources for deeply analysis. According to the requirement of analysis of network public opinions at colleges and universities, an online public opinion detection and analysis clustering method has built based on LDA (Latent Dirichlet allocation).This algorithm melts the subject models based on Latent Dirichlet Allocation and the VSM model based on TF-IDF weight to compute text similarity, and the cluster analysis is carried out. Through the comparisons with the traditional methods, the result shows that this algorithm can improve the performance of text clustering
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Recent Contributions from Engineering, Science & IT (iJES)
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.