Abstract
Traditional topic model with maximum likelihood estimate inevitably suffers from the conditional independence of words given the document’s topic distribution. In this paper, we follow the generative procedure of topic model and learn the topic-word distribution and topics distribution via directly approximating the word-document co-occurrence matrix with matrix decomposition technique. These methods include: (1) Approximating the normalized document-word conditional distribution with the documents probability matrix and words probability matrix based on probabilistic non-negative matrix factorization (NMF); (2) Since the standard NMF is well known to be non-robust to noises and outliers, we extended the probabilistic NMF of the topic model to its robust versions using l21-norm and capped l21-norm based loss functions, respectively. The proposed framework inherits the explicit probabilistic meaning of factors in topic models and simultaneously makes the conditional independence assumption on words unnecessary. Straightforward and efficient algorithms are exploited to solve the corresponding non-smooth and non-convex problems. Experimental results over several benchmark datasets illustrate the effectiveness and superiority of the proposed methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Proceedings of the AAAI Conference on Artificial Intelligence
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.