Abstract
In order to reduce Chinese text similarity calculation complexity and improve text clustering accuracy, this paper proposes a new text similarity calculation algorithm based on DF_LDA. First, we use DF method to realize feature extraction; then, we use LDA method to construct text topic model; finally, we use DF_LDA model obtained to calculate text similarity. Due to considering the text semantic and word frequency information, the new method can improve text clustering precision. In addition, DF_LDA method reduces text feature vector dimensions twice; it can efficiently save text similarity calculating time, and increases text clustering speed. Our experiments on TanCorp-12-Txt and FuDanCorp datasets demonstrate that the proposed method can reduce modeling time efficiently, and improves text clustering accuracy effectively.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have