Abstract

Internet-based news documents are an important source of information transmission. Large numbers of news documents from various news wire sources are available on the internet. The objective of this work is to study the existing term weighting algorithms for feature extraction and to develop an efficient term weighting algorithm for mining salient features from internet-based newswire sources. TF*PDF is the influential algorithm that satisfies the basic property of the features in news documents, i.e., frequency and thus increases the accuracy when compared to other term weighing algorithms. However, only frequency property is not sufficient for salient topic extraction. To overcome that problem, this paper presents an innovative and effective term weighting algorithm that considers position, scattering and topicality along with frequency for extracting salient events. Experimental evaluation shows that the proposed term weighting algorithm performs better than the existing term weighting algorithms in terms of coverage rate.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call