Abstract

Recently many topic models such as Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF) have made important progress towards generating high-level knowledge from a large corpus. However, these algorithms based on random initialization generate different results on the same corpus using the same parameters, denoted as instability problem. For solving this problem, ensembles of NMF are known to be much more stable and accurate than individual NMFs. However, training multiple NMFs for ensembling is computationally expensive. In this paper, we propose a novel scheme to obtain the seemingly contradictory goal of ensembling multiple NMFs without any additional training cost. We train a single NMF algorithm with the cyclical learning rate schedule, which can converge to several local minima along its optimization path. We save the results to the ensemble when the model converges, and then restart the optimization with a large learning rate that can help escape the current local minimum. Based on experiments performed on text corpora using a number of measures to assess, our method can reduce instability at no additional training cost, while simultaneously yields more accurate topic models than traditional single methods and ensemble methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.