Abstract

Social media such as Twitter connect billions of people by allowing them to exchange their thoughts via short-text communication. Topic modelling is a widely used technique for analysing short texts. Discovering topic clusters in short-text collections faces issues with distance-based, density-based and dimensionality reduction-based methods due to their higher dimensionality and short length which results in extremely sparse text representation matrices. We propose the ‘neighbourhood-based assistance’-driven non-negative matrix factorization (NMF) method to handle high-dimensional sparse short-text representation with lower-dimensional projection effectively. We utilized NMF that aligned with the natural non-negativity of text data coupled with the symmetric document affinity information to identify topic distribution in the short text. Neighbourhood information within documents is captured using Jaccard similarity to assist information loss, resulting in higher-to-lower-dimensional projection. Experimental results with Twitter data sets show that the proposed approach is able to attain high accuracy compared to state-of-the-art methods quantitatively, while qualitative analysis with case studies validates the ability of the proposed approach in generating meaningful topic clusters.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call