The ability to post short text and media messages on Social media platforms like Twitter, Facebook, etc., plays a huge role in the exchange of information following a mass emergency event like hurricane, earthquake, tsunami etc. Disaster victims, families, and other relief operation teams utilize social media to help and support one another. Despite the benefits offered by these communication media, the disaster topic related posts (posts that indicate conversations about the disaster event in the aftermath of the disaster) gets lost in the deluge of posts since there would be a surge in the amount of data that gets exchanged following a mass emergency event. This hampers the emergency relief effort, which in turn affects the delivery of useful information to the disaster victims. Research in emergency coordination via social media has received growing interest in recent years, mainly focusing on developing machine learning-based models that can separate disaster-related topic posts from non-disaster related topic posts. Of these, supervised machine learning approaches performed well when the machine learning model trained using source disaster dataset and target disaster dataset are similar. However, in the real world, it may not be feasible as different disasters have different characteristics. So, models developed using supervised machine learning approaches do not perform well in unseen disaster datasets. Therefore, domain adaptation approaches, which address the above limitation by learning classifiers from unlabeled target data in addition to source labelled data, represent a promising direction for social media crisis data classification tasks. The existing domain adaptation techniques for the classification of disaster tweets are experimented with using single disaster event dataset pairs; then, self-training is performed on the source target dataset pairs by considering the highly confident instances in subsequent iterations of training. This could be improved with better feature engineering. Thus, this research proposes a Genetic Algorithm based Domain Adaptation Framework (GADA) for the classification of disaster tweets. The proposed GADA combines the power of 1) Hybrid Feature Selection component using the Genetic Algorithm and Chi-Square Feature Evaluator for feature selection and 2) the Classifier component using Random Forest to classify disaster-related posts from noise on Twitter. The proposed framework addresses the challenge of the lack of labeled data in the target disaster event by proposing a Genetic Algorithm based approach. Experimental results on Twitter datasets corresponding to four disaster domain pair shows that the proposed framework improves the overall performance of the previous supervised approaches and significantly reduces the training time over the previous domain adaptation techniques that do not use the Genetic Algorithm (GA) for feature selection.
Read full abstract