Abstract

Since short text is short of keywords and has sparse features, it brings about the similarity drift problem. The traditional clustering algorithms are usually ineffective and a waste of resources on dealing with short text stream. To overcome the above problems, this paper proposes an incremental clustering algorithm in short text streams based on BM25. The approach makes full use of BM25 to extract keywords and weights of each cluster, and applies extracted parameters to similarity calculation. Theoretical analysis and experiments show that the proposed incremental clustering algorithm solves the similarity drift problem well and achieves satisfactory accuracy and performance in terms of short text stream clustering, compared with the traditional clustering algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call