Abstract

Topic modeling can be synergically interrelated with document clustering. We present an innovative unsupervised approach to the interrelationship of topic modeling with document clustering. The devised approach exploits Bayesian generative modeling and posterior inference, to seamlessly unify and jointly carry out the two tasks, respectively. Specifically, a Bayesian nonparametric model of text collections, formulates an unprecedented interrelationship of word-embedding topics with a Dirichlet process mixture of cluster components. The latter enables countably infinite clusters and permits the automatic inference of their actual number in a statistically principled manner. All latent clusters and topics under the foresaid model are inferred through collapsed Gibbs sampling and parameter estimation. An extensive empirical study of the presented approach is effected on benchmark real-world corpora of text documents. The experimental results demonstrate its higher effectiveness in partitioning text collections and coherently discovering their semantics, compared to state-of-the-art competitors and tailored baselines. Computational efficiency is also looked into under different conditions, in order to provide an insightful analysis of scalability.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.