Classification of Fire Related Tweets on Twitter Using Bidirectional Encoder Representations from Transformers (BERT)

Jairus Mingua,Dionis Padilla,Evan Joy Celino

doi:10.1109/hnicem54116.2021.9731956

Abstract

Bidirectional Encoder Representation from Transformers (BERT) is a transfer learning model approach in natural language processing (NLP). BERT has different types of pre-trained models that can pre-train a language representation to create a model that can be finetuned on specific tasks using a dataset like text classification to produce state of the art predictions. Recent studies providing the use of BERT in natural language processing have highlighted that there are no publicly available Filipino tweet datasets regarding fire reports on social media that lead to a lack of classification models. This paper aims to design and implement a system to classify Filipino tweets using different pre-trained BERT models. Upon creating a model exclusive for organizing Filipino tweets using 2,081 tweets as a dataset that contains fire-related tweets, the researchers were able to compare the accuracy of the different finetuned pre-trained BERT models. The data shows a significant difference in the accuracy of each pre-trained BERT model. The highest of which is the BERT Base Uncased WWM model with a test accuracy of 87.50% and a train loss of 0.06 during training of the dataset. The least accurate among the pre-trained BERT models is the BERT Base Cased WWM model, with a test accuracy of 76.34% and a train loss of 0.2. The result shows that BERT Base Uncased WWM model can be a reliable model in classifying fire-related tweets. The accuracy obtained by the models may vary depending on how large the dataset is.

Full Text