Abstract

Short text topic modeling attracts many researchers’ attention with the emergence of online social media platforms, such as news websites, Twitter and Facebook. Existing topic models for short texts mainly focus on relieving the sparse problem to enhance the accuracy performance of topic modeling. However, most previous topic modeling approaches introduce external corpus word embeddings to enrich the global semantic information in the topic modeling process, ignoring the local association information of the target corpus. And the global semantic information provided by word embedding may not be entirely suitable for the target corpus. In most cases, the noise will be introduced to interfere with the reasoning of the topic. This paper proposes a novel topic model for short text called the Dual View Biterm Topic Model (DV-BTM). Specifically, DV-BTM constructs two views while learning local information from the target corpus and global information to auxiliarily infer about the topic. The semantic similarity view provides global information obtained by introducing pre-trained word embeddings on an external corpus. The Wordnet view is constructed based on the target corpus itself, mainly providing local information about the corpus. Finally, through the collaborative optimization of the dual views, the consistency of the extracted topics is improved. The DV-BTM experiments on two real-world short text datasets demonstrate that DV-BTM has the best performance among the comparison methods in topic coherence and text classification aspects.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.