A novel method for identifying the damage assessment tweets during disaster

Sreenivasulu Madichetty,Sridevi M

doi:10.1016/j.future.2020.10.037

Abstract

Detecting the damage assessment tweets is beneficial to both humanitarian organizations and victims during a disaster. Most of the previous works that identify tweets during a disaster have been related to situational information, availability/requirement of resources, infrastructure damage, etc. There are only a few works focused on detecting the damage assessment tweets. In this paper, a novel method is proposed for identifying the damage assessment tweets during a disaster. Our proposed method effectively utilizes the low-level lexical features, top-most frequency word features, and syntactic features that are specific to damage assessment. These features are weighted by using simple linear regression and Support Vector Regression (SVR) algorithms. Later, the random forest technique is used as a classifier for classifying the tweets. We experimented on 14 standard disaster datasets of different categories for binary and multi-class classification. The proposed method gives an accuracy of 94.62% for detecting the damage assessment tweets. Most importantly, the proposed method can be applied in a situation where enough labeled tweets are not available and also when specific disaster type tweets are not available. This can be done by training the model with past disaster datasets. Our proposed model is trained on datasets such as (i) combination of earthquake disaster datasets (ii) combination of old earthquake disaster datasets, and (iii) combination of old diverse disaster datasets and tested on the other datasets in the cross-domain scenario. The proposed approach is also compared with state-of-the-art approaches, both in-domain and cross-domain, for binary and multi-class classification. The proposed method has improved up to 37.12% accuracy compared with the existing methods.

Full Text