Abstract

The topic model has been widely applied to various research domains such as information retrieval, data mining, and so on. It can discover topics of texts in an unsupervised way. In the early years, most researches mainly focused on long texts. With the emergence of the Internet, the number of short texts is growing rapidly. Most existing schemes to solve the sparsity problems of short texts, are mainly based on data aggregation or model improvements. Among them, the Biterm Topic Model is one of the most representative models. It proposed a new way to model topics based on document-level word pairs and has shown creativity and effectiveness. However, this strategy ignores those semantically similar and rarely co-occurrent word pairs. What's more, most researches ignore the multi-sense phenomenon in natural languages. In this paper, we utilize multi-sense word vectors to extract similar word pairs from the whole corpus by considering multiple senses. Based on this idea, we introduce a novel short-text topic model, which disambiguates multiple senses of words and generates more reasonable global biterms. Experimental results on two open-source English datasets have shown superiority to state-of-the-art topic models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.