Abstract

As traditional text representations are not suitable for online dynamic streams, this paper presents a hot topic extraction technique that can be used for tracking news topics over time. The model combines individual word burst into the document-word vector representation, which can emphasize the temporally features of text streams. An energy ratio threshold based burst detection approach is proposed and TF-PDF is then combined to weigh the terms. Experiment results demonstrate that this model is effective in topic extraction for news stream and it can better improve the clustering performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call