Evaluation of Stability and Similarity of Latent Dirichlet Allocation

Jun Tang,Ruilong Huo,Jiali Yao

doi:10.1109/wcse.2013.17

Abstract

Latent Dirichlet Allocation (LDA) is an unsupervised, statistical method to model documents and discover latent semantic topics from large set of documents and categorize them into learned topics. In this paper, we first introduce LDA and its distributed version Parallel LDA (PLDA), along with some popular implementations. Then we propose a systematic solution to evaluate stability and similarity of the trained models and classification results of LDA/PLDA. We address three key challenges within the evaluation solution: (i) topics matching in Kullback Liebler (KL) divergence calculation, (ii) calculation of stability using KL divergence and interpretation of relationship between KL divergence and stability of the trained model and the classification results, (iii) calculation and evaluation of similarity of trained models and classification results. Finally, we experiment with real life datasets to show that our solution is sufficient and efficient.

Full Text