Abstract

The Event Detection area is gaining increasing interest among researchers. The social media data growth induces the emergence of new algorithms along with the improvement of existing solutions. In this paper we propose to improve of existing algorithm for event detection, SEDTWik (Segment-based Event Detection from Tweets using Wikipedia). The authors define event as a set of similar segments of words within a given time window. A segment is defined as a word or phrase taken from the analyzed text data. The SEDTWik uses Wikipedia as a “supervisor” to identify the segments, to calculate the segments’ bursty value and to calculate the segments’ newsworthiness. We examined the SEDTWik algorithm using our data from Telegram online social network. The overall network message construction of Twitter is different from that of Telegram. Therefore, we transformed the Telegram meta-data to fit the SEDTWik requirements. Another much relevant difference in our experiment lies in the fact that our corpora contain messages in Russian and Kazakh languages. Our results show that the SEDTWik algorithm is strongly dependent on the broad and unfocused Wikipedia data. Such dependency was shown to have a loss effect on the event detection accuracy. This result founds our motivation to improve the SEDTWik algorithm using dynamically calculated segment probabilities from the analyzing data streams.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.