Abstract

In order to reduce Chinese text similarity calculation complexity and improve text clustering accuracy, this paper proposes a new text similarity calculation algorithm based on DF_LDA. First, we use DF method to realize feature extraction; then, we use LDA method to construct text topic model; finally, we use DF_LDA model obtained to calculate text similarity. Due to considering the text semantic and word frequency information, the new method can improve text clustering precision. In addition, DF_LDA method reduces text feature vector dimensions twice; it can efficiently save text similarity calculating time, and increases text clustering speed. Our experiments on TanCorp-12-Txt and FuDanCorp datasets demonstrate that the proposed method can reduce modeling time efficiently, and improves text clustering accuracy effectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call