Abstract

Social media data carries abundant hidden occurrences of real-time events in the world which raises the demand for efficient event detection and trending system. The Locality Sensitive Hashing (LSH) technique is capable of processing the large-scale big datasets. In this thesis, a novel framework is proposed for detecting and trending events from tweet clusters presence in Twitter1 dataset that are discovered using LSH. The experimental results obtained from this research work showed that the LSH technique took only 12.99% of the running time compared to that required for K-means to find all of the tweet clusters. Key challenges include: 1) construction of dictionary using incremental TF-IDF in high-dimensional data in order to create tweet feature vector 2) leveraging LSH to find truly interesting events 3) trending the behavior of event based on time, geo-locations and cluster size and 4) speed-up the cluster-discovery process while retaining the cluster quality.

Highlights

  • Online social media provides an abundance of data on public opinions which can be used to extract the occurrences of real-time events in the world

  • This study deals the problem of creating a tweet feature vector in high dimensional data by using a static dictionary constructed for each chunk using an incremental Term Frequency - Inverse Document Frequency (TF-inverse document frequency (IDF)) technique

  • State-of-art technique is suitable for analysing the large-scale social media data because of its capability to fast processing of real-time data

Read more

Summary

Introduction

Online social media provides an abundance of data on public opinions which can be used to extract the occurrences of real-time events in the world. The LSH technique is employed to find tweet clusters from which events are detected and trended. Works attempts to use Charikar’s approach to compute the K-bit signature for a tweet feature vector which is further used as input for the prefix tree based LSH approach proposed by Kamath et al [25] to discover the tweet clusters from which the event is detected and it is trended. Background information on the prefix tree data structure which is used in LSH approach to replace the hash table is discussed . We leverage the prefix tree data structure in this thesis to find the nearest neighbour of the given tweet. Works it a popular data source for real-time event detection. E2LSH was compared with the K-means algorithm and it was confirmed that E2LSH boosts the retrieval accuracy compared to the K-means algorithm at the extra cost of response time due to the query expansion [51]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call