Abstract

The abundance and real-time availability of Twitter data have proved beneficial in detecting events in various domains such as emergency situations, crime detection, public health, place recommendations, etc. Nevertheless, two critical challenges occur while detecting events using social media data. First, the uncertainty in capturing the contextual relationship among tweets, which is the result of the limited availability of the contextual information due to the small length of tweets. Second, the high computation cost required in event detection due to massive data processing. Earlier research works, addressing these challenges, have tried to capture the contextual information by using the dense vector representations of texts leveraging deep neural word embedding generation models such as Word2Vec and GloVe. However, these models are trained on the Euclidean vector space which fails to amalgamate the directional information of the vectors with the semantic information in text, incurring high computational costs. To target both the problems simultaneously, we propose modeling Twitter data as a graph-of-sentences which retains the contextual relationships while maintaining lower computational cost. The proposed model captures contextual information using JoSE, a spherical vector representation leveraging the word-word and word-paragraph semantic co-occurrence statistics in a spherical generative model. Furthermore, the framework uses the weighted-graph model to capture all the relationships among the Twitter data efficiently. The graph is further pruned with the help of the graph component filtering approach. The graph clustering model, employed to detect the events, leverages the edge weights and the partial-k clustering approach maintaining low computation costs. The experimentation on the annotated benchmark Twitter data set and the real-world datasets show improved run-time performance up to 30% while maintaining the qualitative performance (F1-score) comparable to the state-of-the-art models.

Highlights

  • Online sources such as Twitter provide a lot of meta-data, which can help in providing information such as location, time-stamps, and number of followers, etc., which has been utilized over the years to provide application-centric results

  • The results shown in [24] shows the better accuracy of text clustering and word similarity tasks on the document-based dataset

  • The proposed graph-based Twitter data representation helps in incorporating the uncertainty in the relationships among the tweets in the form of a graph. This Twitter graph helps in keeping track of the contextual as well as directional information while keeping the computational costs i.e., run-time requirements low

Read more

Summary

INTRODUCTION

Online sources such as Twitter provide a lot of meta-data, which can help in providing information such as location, time-stamps, and number of followers, etc., which has been utilized over the years to provide application-centric results. Multi-grams help to provide more contextual information while maintaining the relationships among the data points This limitation is addressed in GoS, which uses tweets as the nodes and the cosine similarity among word vectors as edges. A. RESEARCH OBJECTIVES AND CONTRIBUTIONS The primary objective of this research work is to propose an approximate graph-based global event detection model that captures the uncertainty in the Twitter data. RESEARCH OBJECTIVES AND CONTRIBUTIONS The primary objective of this research work is to propose an approximate graph-based global event detection model that captures the uncertainty in the Twitter data This model will intuitively improve the run-time performance while maintaining the qualitative performance (F1-score) of the event detection process. A spherical embedding based Graph of Tweets representation has been proposed to model the Twitter data This model captures most of the contextual and the directional information from the tweets.

RELATED WORK
Result
FEATURE EXTRACTION
TWITTER GRAPH GENERATION
EXPERIMENTATION AND RESULTS
INTRA-CLUSTER SIMILARITY
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call