Abstract

Distinguishing informative and actionable messages from a social media platform like Twitter is critical for facilitating disaster management. For this purpose, we compile a multilingual dataset of over 130K samples for multi-label classification of disaster-related tweets. We present a masking-based loss function for partially labelled samples and demonstrate the effectiveness of Manifold Mixup in the text domain. Our main model is based on Multilingual BERT, which we further improve with Manifold Mixup. We show that our model generalizes to unseen disasters in the test set. Furthermore, we analyze the capability of our model for zero-shot generalization to new languages. Our code, dataset, and other resources are available on Github.

Highlights

  • In times of disaster, affected individuals often turn to social media platforms, such as Twitter or Facebook, to express their feelings generated by a disaster, update friends and relatives on their status, request help or supplies, or report useful information to the disaster response teams

  • We stress that there are a lot of disaster-prone non-Englishspeaking countries, which could benefit from a multilingual classifier that can be used in real-time to identify useful information on social media

  • M-BERT for disaster-related tweet classification and we demonstrate its strong performance on unseen disasters and languages

Read more

Summary

Introduction

In times of disaster, affected individuals often turn to social media platforms, such as Twitter or Facebook, to express their feelings generated by a disaster, update friends and relatives on their status, request help or supplies, or report useful information to the disaster response teams. In. 1https://github.com/JRC1995/ Multilingual-BERT-Disaster turn, this can help facilitate disaster response and increase situational awareness. 1https://github.com/JRC1995/ Multilingual-BERT-Disaster turn, this can help facilitate disaster response and increase situational awareness Towards this goal, in recent years, many works have focused on disaster-related tweet classification (Alam et al, 2018b; Mazloom et al, 2018; Nguyen et al, 2017; Li et al, 2017; Neppalli et al, 2018; Caragea et al, 2016, 2011). There is a lack of a large scale standard multilingual disaster-related dataset for multi-label classification with diverse disaster types. Against this background and needs, we make the following contributions: 1. By utilizing a class-mask (elaborated in Section 4.1), we make use of both binary-classification data and multi-class classification data in the same training phase

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call