Abstract

Bidirectional Encoder Representation from Transformers (BERT) is a transfer learning model approach in natural language processing (NLP). BERT has different types of pre-trained models that can pre-train a language representation to create a model that can be finetuned on specific tasks using a dataset like text classification to produce state of the art predictions. Recent studies providing the use of BERT in natural language processing have highlighted that there are no publicly available Filipino tweet datasets regarding fire reports on social media that lead to a lack of classification models. This paper aims to design and implement a system to classify Filipino tweets using different pre-trained BERT models. Upon creating a model exclusive for organizing Filipino tweets using 2,081 tweets as a dataset that contains fire-related tweets, the researchers were able to compare the accuracy of the different finetuned pre-trained BERT models. The data shows a significant difference in the accuracy of each pre-trained BERT model. The highest of which is the BERT Base Uncased WWM model with a test accuracy of 87.50% and a train loss of 0.06 during training of the dataset. The least accurate among the pre-trained BERT models is the BERT Base Cased WWM model, with a test accuracy of 76.34% and a train loss of 0.2. The result shows that BERT Base Uncased WWM model can be a reliable model in classifying fire-related tweets. The accuracy obtained by the models may vary depending on how large the dataset is.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call