Abstract

In the present day, online users are incentivized to engage in short text-based communication. These short texts harbor a significant amount of implicit information, including opinions, topics, and emotions, which are of notable value for both exploration and analysis. By alleviating the sparsity in short texts, topic models can be used to discover topics from large collections of short texts. While there is a large body of surveys focused on topic modeling, but only a few of them have focused on the short texts. This paper presents a comprehensive overview of topic modeling methods for short texts from a novel perspective. Firstly, it discusses short text probabilistic topic models and outlines the directions in which they can be improved. Secondly, it explores short text neural topic models, which can be categorized into three groups based on their underlying structures. In addition, this paper provides a detailed investigation of embedding methods in topic modeling. Moreover, various applications and corresponding works are surveyed, with a focus on short texts. The commonly used public corpora and evaluation indicators for topic modeling are also summarized. Finally, the advantages and disadvantages of short text topic modeling are discussed in detail, and future research directions are proposed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call