Abstract

Microblogging websites such as Twitter have caused sentiment analysis research to increase in popularity over the last several decades. However, most studies focus on the English language, which leaves other languages underrepresented. Therefore, in this paper, we compare several modeling techniques for sentiment analysis using a new dataset containing Flemish tweets. The key contribution of our paper lies in its innovative experimental design: we compared different preprocessing techniques and vector representations to find the best-performing combination for a Flemish dataset. We compared models belonging to four different categories: lexicon-based methods, traditional machine-learning models, neural networks, and attention-based models. We found that more preprocessing leads to better results, but the best-performing vector representation approach depends on the model applied. Moreover, an immense gap was observed between the performances of the lexicon-based approaches and those of the other models. The traditional machine learning approaches and the neural networks produced similar results, but the attention-based model was the best-performing technique. Nevertheless, a tradeoff should be made between computational expenses and performance gains.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call