Abstract

Traditional text clustering algorithms largely rely on word co-occurrence information in text data to infer the hidden topics. However, due to the limited content length of the short text, the word co-occurrence information in the short text is very scarce, which we call the short text feature sparse problem. In order to solve the feature sparse problem in the dynamic clustering of short texts, and more better capture the dynamic evolution of topics in the dynamic short text data over time, this paper proposed a semantic-enhanced dynamic Dirichlet multinomial Mixture (SDDMM) model with enhanced semantics, which uses the additional semantic knowledge provided by word embedding to assist in improving the effect of short texts’ dynamic clustering, at the same time, because the generation of topics in the dynamic clustering process is affected by inherited historical topics, the introduction of semantic knowledge can automatically adjust the strength of topic inheritance, making the dynamic evolution of the number of clusters more in line with the actual data. Experiments on synthetic data and real data show that the SDDMM model effectively improves the short texts’ dynamic clustering effect.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call