Abstract

Topic Detection and Tracking (TDT) on Twitter emulates human identifying developments in events from a stream of tweets, but while event participants are important for humans to understand what happens during events, machines have no knowledge of them. Our evaluation on football matches and basketball games shows that identifying event participants from tweets is a difficult problem exacerbated by Twitter’s noise and bias. As a result, traditional Named Entity Recognition (NER) approaches struggle to identify participants from the pre-event Twitter stream. To overcome these challenges, we describe Automatic Participant Detection (APD) to detect an event’s participants before the event starts and improve the machine understanding of events. We propose a six-step framework to identify participants and present our implementation, which combines information from Twitter’s pre-event stream and Wikipedia. In spite of the difficulties associated with Twitter and NER in the challenging context of events, our approach manages to restrict noise and consistently detects the majority of the participants. By empowering machines with some of the knowledge that humans have about events, APD lays the foundation not just for improved TDT systems, but also for a future where machines can model and mine events for themselves.

Highlights

  • IntroductionThe idea of a football match is a well-defined concept: Two teams of

  • For many people, the idea of a football match is a well-defined concept: Two teams of11 players, playing each other for 90 min of football

  • Extrapolation is analogous to entity set expansion. This step is necessary in Automatic Participant Detection (APD) because, as we show in Section 4, the resolution step inherits the bias of Twitter: Users discuss a few, popular participants and barely mention the rest

Read more

Summary

Introduction

The idea of a football match is a well-defined concept: Two teams of. 11 players, playing each other for 90 min of football. Natural disasters and tragedies all attract a lot of attention, and Topic Detection and Tracking (TDT) research pounced on the opportunity, using tweets to build timelines of events in near real-time. These timelines remain far below the standards of the news media, partially because machines do not understand events like humans. The players directly influence the outcome of a football match, and candidates shape elections. The traditional definition of an event excludes participants, and even approaches that focus on them [1,2,3] never formalize what qualifies an entity to be a participant

Methods
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call