Abstract

Machine Learning is frequently used to predict and classify data. Natural Language Processing (NLP) uses machine learning to classify strings of words. There are many different machine learning models that can be used for NLP, with three main categories being regression, decision tree, and neural net models. Each has their own advantages and drawbacks. After being trained and tested on a set of tweets concerning cyberbullying, Logistic Regression, XGboost, and Long Short-Term Memory (LSTM) were compared in terms of several metrics, including accuracy, recall, precision, and f1-score. Afterwards, the metrics were considered in combination with model runtime and complexity to determine which model was most appropriate for the given dataset and other similar datasets. Logistic Regression was found to lack sufficient complexity to properly classify the data. LSTM had worse metrics than XGboost and had significantly higher complexity and runtime. XGboost performed best, with the highest metrics and relatively short runtime.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call