Abstract

Twitter is a social media site where people post their personal experiences, opinions, and news. Due to the ubiquitous real-time data availability, many rescue agencies monitor this data regularly to identify disasters, reduce risk, and save lives. However, it is impossible for humans to manually check the mass amount of data and identify disasters in real-time. For this purpose, many research have been proposed to present words in machine-understandable representations and apply machine learning methods on the word representations to identify the sentiment of a text. The previous research methods provide a single vector representation or embedding of a word from a given document. However, the recent advanced contextual embedding method (BERT — Bidirectional Encoder Representations from Transformers) constructs different vectors for the same word in different contexts. The BERT embeddings have been used successfully in various Natural Language Processing (NLP) tasks, yet there is no concrete analysis of how these representations are helpful in disaster-type tweet analysis. This research study explores the efficacy of the BERT embeddings on predicting disaster from Twitter data and compares these to traditional context-free word embedding methods. We provide both quantitative and qualitative results for this study. The results show that the contextual embeddings have the best results in disaster prediction task than the traditional word embeddings. Furthermore, we discuss the opportunities and challenges of contextual embeddings on sentiment analysis of Twitter data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.