Abstract

Abstract Text representation based on latent topic model is seen as a non-Gaussian problem where the observed words and latent topics are multinomial variables and the topic proportionals are Dirichlet variables. Traditional topic model is established by introducing a single Dirichlet prior to characterize the topic proportionals. The words in a text document are represented by a random mixture of semantic topics. However, in real world, a single Dirichlet distribution may not faithfully reflect the variations of topic proportionals estimated from the heterogeneous documents. To address these variations, we propose a new latent variable model where latent topics and their proportionals are learned by incorporating the prior based on Dirichlet mixture model. The resulting latent Dirichlet mixture model (LDMM) is constructed for topic clustering as well as document clustering. Multiple Dirichlets provide a solution to build structural latent variables in learning representation over a variety of topics. This study carries out the inference for LDMM according to the variational Bayes and the collapsed variational Bayes. Such an unsupervised LDMM is further extended to a supervised LDMM for text classification. Experiments on document representation, summarization and classification show the merit of structural prior in LDMM topic models.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.