Abstract
With the advent of the Web and various specialized digital libraries, the automatic extraction of useful information from text has become an increasingly important research in Data mining. In this paper we present a new MH based algorithm that extracts both the topics expressed in large text document collections and also models how the authors of documents use those topics. The methodology is illustrated using a sample of 1740 documents and 2037 authors of NIPS conference papers. A novel feature of our model is the inclusion of MH sampling for author topic models, in which authors are modeled as probability distributions over topics. The author-topic models can be used to support a variety of interactive and exploratory queries on the dataset. Algorithm proposed in this paper is the implementation of enhanced author topic modeling in text collection for extraction of topics from documents which will be useful for efficient search and retrieval. This paper presents an unsupervised learning technique for extracting information from the real world large text collections. This involves clustering which is used for extracting a representation from a collection of documents. Each cluster is associated with a topic and a single document is associated in only one cluster. Traditional Author Topic Model encounters problem in case of multi topic documents. Experimental results using proposed algorithm achieved the same classification accuracy with reduced time (50%) to extract the topics.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.