Abstract

Topic detection is a difficult challenging task, especially when the exact number of topics is unknown. In this article, we present a novel topic detection approach based on neural computing to detect topics in a microblogging dataset. We use an unsupervised neural sentence embedding model to map blogs to an embedding space. The proposed model is a weighted power mean sentence embedding model in which weights are calculated by a targeted attention mechanism. The experimental results show that our embedding model performs better than baseline in sentence clustering. In addition, we propose a clustering algorithm, referred to as Relationship-Aware DBSCAN (RADBSCAN), to discover topics from a microblogging dataset in which the number of topics is automatically determined by the characteristics of the dataset. Moreover, to provide parameter insensibility, we use the forwarding relationship in the blogs as a bridge of two independent clusters. Finally, we validate the proposed method on a dataset from the Sina microblog. The results show that our approach can detect all topics successfully and can extract the keywords of each topic.

Highlights

  • I N recent years, microblog platforms have become important vehicles for people to share opinions, explore new events, and disseminate information

  • PERFORMANCE EVALUATION we evaluate the performance of our proposed method including both the sentence embedding model power-mean and attention-based neural model (PANM) and the clustering algorithm RADBSCAN

  • SENTENCE EMBEDDING PERFORMANCE EVALUATION In this work, we propose a sentence embedding model, PANM, through which sentence vectors are generated; these vectors are used for short text clustering

Read more

Summary

Introduction

I N recent years, microblog platforms have become important vehicles for people to share opinions, explore new events, and disseminate information. Performing topic detection in microblogs is useful in many ways, such as for natural disaster detection [1], news recommendations [2], community detection [3] and political analyses [4]. The detecting of topics from a large microblog dataset has become an important area of research interest. Latent Dirichlet Allocation (LDA) [5] and various LDA-based models are widely used for topic detection. These models evaluate some topics, such as food, sports and the military, and calculate the probability that a document belongs to a topic

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call