Abstract

Text clustering plays an important role in data mining and machine learning. After years of development, clustering technology has produced a series of theories and methods. However, in the text clustering of Chinese news, the mainstream LDA method suffers a high time complex. In order to improve the speed, this paper puts forward a new method in which neural network language model is first applied to text clustering. Text clustering is first converted to its dual problem called word clustering. With neural network language model, we can get word vector which can be used in the fuzzy k-means of the Chinese news keyword set. Based on the keyword clustering result, we can get text clustering result of Chinese news by a single transition. Experiments have show this method's running speed is five times faster than LDA. This method has been successfully used in the Sohu news recommendation system currently.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call