Keyword extraction from single documents using mean word intermediate distance

Sifatullah Siddiqi,Aditi Sharan

doi:10.19101/ijacr.2016.625003

Abstract

extraction is an important task in text mining. In this paper a novel, unsupervised, domain independent and language independent approach for automatic keyword extraction from single documents have been proposed. We have used the word intermediate distance vector and its mean value to extract keywords. We have compared our approach with results from the standard deviation of intermediate distances approach as standard and found that there is heavy overlapping between the results of both approaches with the advantage that our approach is faster, especially in case of long documents as it removes the need to compute the standard deviation of word intermediate distance vector. Two famous works viz. Origin of Species and A Brief History of Time to demonstrate the experimental results have been used. Experiments show that the proposed approach works almost as better as the standard deviation approach and the percentage overlap between top 30 extracted keywords is more than 50%.

Full Text