Document Similarity Measure Based on Topic Model

Ming He,Zhen Zhen Wang,Yong Ping Du

doi:10.4028/www.scientific.net/amm.513-517.1280

Document Similarity Measure Based on Topic Model

Ming He, Zhen Zhen Wang + Show 1 more

https://doi.org/10.4028/www.scientific.net/amm.513-517.1280

Copy DOI

Journal: Applied Mechanics and Materials

Publication Date: Feb 6, 2014

Affiliation: Beijing University of Technology

#Field In Natural Language Processing #Topic Model + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Document similarity computation is an exciting research topic in information retrieval (IR) and it is a key issue for automatic document categorization, clustering analysis, fuzzy query and question answering. Topic model is an emerging field in natural language processing (NLP), IR and machine learning (ML). In this paper, we apply a latent Dirichlet allocation (LDA) topic model-based method to compute similarity between documents. By mapping a document with term space representation into a topic space, a distribution over topics derived for computing document similarity. An empirical study using real data set demonstrates the efficiency of our method.

Full Text