Abstract

Latent semantic analysis is a prominent semantic themes detection and topic modelling technique. In this paper, we have designed a three-level weight for latent semantic analysis for creating an optimised semantic space for large collection of documents. Using this novel approach, an efficient latent semantic space is created, in which terms in documents comes closer to each other, which appear far away in actual document collection. In this approach, authors used two dataset: first is a synthetic dataset consists of small stories collected by the authors; second is benchmark BBC-news dataset used in text mining applications. These proposed three level weight models assign weight at term level, document level, and at a corpus level. These weight models are known as: 1) NPC; 2) NTC; 3) APC; 4) ATC. These weight models are tested on both the dataset, compared with state of the art term frequency and it has shown significant improved performances in term set correlation, document set correlation and has also shown highest correlation in semantic similarity of terms in semantic space generated through these three level weights. Our approach also shows automatic context clustering generated in dataset through three level weights.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call