Abstract
Twitter is a popular micro-blogging platform that offers a rich source of real-time information about real-world events, particularly during mass emergencies/crises. During any crisis, it is necessary to filter through a huge amount of tweets within a short span of time to extract crisis related information. Different machine learning (ML) algorithms have been used to classify crisis related tweets from non crisis-related ones, and thus play a significant role in building an application for emergency management. With the proliferation of data, it becomes unmanageable to process the growing stream of information. So this paper focuses on (1) different Natural Language Processing (NLP) techniques to make tweets suitable for applying ML algorithms, (2) Different word-embeddings to create a more domain specific semantic space and address dimension reduction for efficiently analyzing tweets, (3) comparative analysis of different state-of-the-art ML algorithms (classifiers) which can be applied to categorize crisis-related tweets with a higher accuracy. The experiments have been done on six different crisis related datasets, each approximately consisting of 10,000 tweets. With our analysis, it is shown that Neural Networks have outperformed all other classifiers like Naive Bayes, Logistic Regression, and Support Vector Machines. Moreover, it is seen that if word-embedding models are trained with more domain specific data, they can even outperform the pre-trained models.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.