Abstract

Essential difference between topic detection and text clustering is distribution of news corpus and time characteristics of news corpus. So we should study topic detection according to the news corpus, and it is necessary for news corpus to be in-depth and extensive research. Vector space model (VSM) is one of the most simple and effective topics representation model. And K-means is a well-known and widely used partitional clustering method. Therefore, we do a topic detection experiment to study how news corpus and K-means affect topic detection. Then we get the variation law that they affect topic detection, and add up their optimal values in topic detection. Finally, TDT evaluation methods prove that the optimal topic detection overall performance in topic detection experiment based on large-scale corpus enhances by 38.378% more than topic detection based on small-scale corpus. This experiment shows that topic detection based on K-means is suited to deal with large-scale data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call