Multi-faceted Classification for the Identification of Informative Communications during Crises: Case of COVID-19

Kapil Yadav,Lingzi Hong,Guanghui Ye,Ajay Jayanth,Zhuoli Xie

doi:10.1109/compsac51774.2021.00125

Abstract

Social media data are used to enhance crisis management, as people widely adopt social media to share and acquire information to cope with uncertainties in crises. Identification and extraction of informative communications out of large volumes of data is critical for accurate situational awareness and timely response. Existing studies use conditions of geolocations, keywords, and topics separately or jointly to retrieve data that can be crisis related, but are not enough to filter subsets of data for different crisis management tasks. We propose that the crisis communication purposes of users can be detected to enhance data selection and prioritization for different crisis management tasks. A classification framework was built to identify three facets of a message: content type, audience type, and information source. The definitions of these categories are not dependent on a specific type of crises. So the classification framework can be potentially applied to different crisis scenarios. Machine learning models were created for the automatic classification of messages. Results showed the CNN-based model achieved the best accuracy (88.5%) for the classification of content type. The proposed Naive Bayes and logistic repression with predetermined features can best differentiate audience types and information source with an accuracy of 72.7% and 72.2%, respectively.

Full Text