A author topic model based unsupervised algorithm for learning topics from large text collections

S Mercy Shalinie,K Sundarakantham,S Pushparathi

doi:10.1109/icrtit.2011.5972315

Abstract

With the advent of the Web and various specialized digital libraries, the automatic extraction of useful information from text has become an increasingly important research in Data mining. In this paper we present a new MH based algorithm that extracts both the topics expressed in large text document collections and also models how the authors of documents use those topics. The methodology is illustrated using a sample of 1740 documents and 2037 authors of NIPS conference papers. A novel feature of our model is the inclusion of MH sampling for author topic models, in which authors are modeled as probability distributions over topics. The author-topic models can be used to support a variety of interactive and exploratory queries on the dataset. Algorithm proposed in this paper is the implementation of enhanced author topic modeling in text collection for extraction of topics from documents which will be useful for efficient search and retrieval. This paper presents an unsupervised learning technique for extracting information from the real world large text collections. This involves clustering which is used for extracting a representation from a collection of documents. Each cluster is associated with a topic and a single document is associated in only one cluster. Traditional Author Topic Model encounters problem in case of multi topic documents. Experimental results using proposed algorithm achieved the same classification accuracy with reduced time (50%) to extract the topics.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A author topic model based unsupervised algorithm for learning topics from large text collections

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Learning author-topic models from text corpora
Michal Rosen-Zvi ... Thomas Griffiths
ACM Transactions on Information Systems | VOL. 28
Michal Rosen-Zvi, et. al.Michal Rosen-Zvi ... Thomas Griffiths
01 Jan 2009
ACM Transactions on Information Systems | VOL. 28

TexTonic: Interactive visualization for exploration and discovery of very large text collections
Celeste Lyn Paul ... Ralph Perko
Information Visualization | VOL. 18
Celeste Lyn Paul, et. al.Celeste Lyn Paul ... Ralph Perko
12 Jul 2018
Information Visualization | VOL. 18

A Hierarchical Bayesian Model for Text Corpora
Peng Han ... Ying Nan Zhang
Applied Mechanics and Materials | VOL. 687-691
Peng Han, et. al.Peng Han ... Ying Nan Zhang
01 Nov 2014
Applied Mechanics and Materials | VOL. 687-691

Efficient fuzzy search in large text collections
Hannah Bast ... Marjan Celikik
ACM Transactions on Information Systems | VOL. 31
Hannah Bast, et. al.Hannah Bast ... Marjan Celikik
01 May 2013
ACM Transactions on Information Systems | VOL. 31

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A author topic model based unsupervised algorithm for learning topics from large text collections

Abstract

Talk to us

Similar Papers