Abstract

With the development of social network, users tend to get latest news from online social media, which demands an effective and efficient representation for social-media texts to filter from growing online short texts timely. However, VSM(Vector Space Model)-based representation for short texts is considerably sparse, which significantly decreases the effectiveness and efficiency to discover new social news. What’s more, the traditional dimension reduction methods target to choose key words according to their appearing frequency, which make it unsuitable for short social-media texts because of its sparsity of key words. In consideration of the word semantic relations, this paper introduces a non-backtracking word clustering and counting mechanism to utilize key word clusters instead of key words to represent short social-media texts and thus reduce the dimensions of the representation model. In addition, we implement this representation mechanism on a real-time computation platform, Storm, which enhances the representation efficiency of social-media texts. Experiments based on real-world social texts demonstrated that our mechanism can effectively reduce the dimension of the representation model and improve the efficiency of the latest social news discovery.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call