Abstract

In this paper, we present a new approach for detecting novel events from social media, specially Twitter, at real-time. An event is usually defined by who, what, where and when, and an event tweet usually contains terms corresponding to these aspects. To exploit this information, we propose a method that incorporates simple semantics by splitting the tweet term space into groups of terms that have the meaning of the same type. These groups are called semantic categories (classes) and each reflects one or more event aspects. The semantic classes include named entity, mention, location, hashtag, verb, noun and embedded link. To group tweets talking about the same event into the same cluster, similarity measuring is conducted by calculating class-wise similarity and then aggregating them together. Users of a real-time event detection system are usually only interested in novel (new) events, which are happening now or just happened a short time ago. To fulfill this requirement, a temporal identification module is used to filter out event clusters that are about old stories. The clustering module also computes a novelty score for each event cluster, which reflects how novel the event is, compared to previous events. We evaluated our event detection method using multiple quality metrics and a large-scale event corpus having millions of tweets. The experiment results show that the proposed online event detection method achieves the state-of-the-art performance. Our experiment also shows that the temporal identification module can effectively detect old events.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.