Comparison of the efficacy of Natural Language Processing Algorithms at classifying Cyberbullying Tweets

Eric Cui,Christopher Brown

doi:10.47611/jsrhs.v11i4.3432

Eric Cui, Christopher Brown

Open Access

https://doi.org/10.47611/jsrhs.v11i4.3432

Copy DOI

Abstract

Machine Learning is frequently used to predict and classify data. Natural Language Processing (NLP) uses machine learning to classify strings of words. There are many different machine learning models that can be used for NLP, with three main categories being regression, decision tree, and neural net models. Each has their own advantages and drawbacks. After being trained and tested on a set of tweets concerning cyberbullying, Logistic Regression, XGboost, and Long Short-Term Memory (LSTM) were compared in terms of several metrics, including accuracy, recall, precision, and f1-score. Afterwards, the metrics were considered in combination with model runtime and complexity to determine which model was most appropriate for the given dataset and other similar datasets. Logistic Regression was found to lack sufficient complexity to properly classify the data. LSTM had worse metrics than XGboost and had significantly higher complexity and runtime. XGboost performed best, with the highest metrics and relatively short runtime.

Full Text