Abstract
Microblogging sites like Twitter have become important sources of real-time information during disaster events. A large amount of valuable situational information is posted in these sites during disasters; however, the information is dispersed among hundreds of thousands of tweets containing sentiments and opinions of the masses. To effectively utilize microblogging sites during disaster events, it is necessary to not only extract the situational information from the large amounts of sentiments and opinions, but also to summarize the large amounts of situational information posted in real-time. During disasters in countries like India, a sizable number of tweets are posted in local resource-poor languages besides the normal English-language tweets. For instance, in the Indian subcontinent, a large number of tweets are posted in Hindi/Devanagari (the national language of India), and some of the information contained in such non-English tweets is not available (or available at a later point of time) through English tweets. In this work, we develop a novel classification-summarization framework which handles tweets in both English and Hindi—we first extract tweets containing situational information, and then summarize this information. Our proposed methodology is developed based on the understanding of how several concepts evolve in Twitter during disaster. This understanding helps us achieve superior performance compared to the state-of-the-art tweet classifiers and summarization approaches on English tweets. Additionally, to our knowledge, this is the first attempt to extract situational information from non-English tweets.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have