The application of multimodal deep learning for emergency response and recovery, specifically in disaster social media analysis, is of utmost importance. It is worth noting that in real-world scenarios, sudden disaster events may differ from the training data, which may require the multimodal network to predict them as unknown classes instead of misclassifying them to known ones. Previous studies have primarily focused on model accuracy in a closed environment and may not be able to directly detect unknown classes. Thus, we propose a novel multimodal model for categorizing social media related to disasters in an open-world environment. Our methodology entails utilizing pre-trained unimodal models as encoders for each modality and performing information fusion with a cross-attention module to obtain the joint representation. For open-world detection, we use a multitask classifier that encompasses both a closed-world and an open-world classifier. The closed-world classifier is trained on the original data to classify known classes, whereas the open-world classifier is used to determine whether the input belongs to a known class. Furthermore, we propose a sample generation strategy that models the distribution of unknown samples using known data, which allows the open-world classifier to identify unknown samples. Our experiments were conducted on two public datasets, namely CrisisMMD and MHII. According to the experimental results, the proposed method outperforms other baselines and approaches in crisis information classification.
Read full abstract