Abstract

Topic models are widely used in building clusters of documents for more than a decade, yet problems occurring in choosing the optimal number of topics. The main problem is the lack of a stable metric of the quality of topics obtained during the construction of topic models. The authors analyzed from previous works, most of the models used in determining the number of topics are non-parametric and the quality of topics determined by using perplexity and coherence measures and concluded that they are not applicable in solving this problem. In this paper, we used the parametric method, which is an extension of the traditional topic model with visual access tendency for visualization of the number of topics (clusters) to complement clustering and to choose the optimal number of topics based on results of cluster validity indices. Developed hybrid topic models are demonstrated with different Twitter datasets on various topics in obtaining the optimal number of topics and in measuring the quality of clusters. The experimental results showed that the Visual Non-negative Matrix Factorization (VNMF) topic model performs well in determining the optimal number of topics with interactive visualization and in performance measure of the quality of clusters with validity indices.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.