Annotation Technique for Health-Related Tweets Sentiment Analysis

Asma Baccouche,Begonya Garcia-Zapirain,Adel Elmaghraby

doi:10.1109/isspit.2018.8642685

Abstract

This paper introduces a novel implementation of an automatic labeling technique, oriented to health-related Twitter annotation for three languages: English, French, and Arabic. Thus, sentiment analysis is performed. The presented technique relies on data preprocessing, allowing for automatic tweets annotation based on domain knowledge, Natural Language Processing (NLP), and sentiment-lexicon dictionaries. In order to conduct our experiments, we use Deep Learning technique for sentiment prediction. In particular, we implement a Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Long Short-Term Memory (LSTM). In training the model, we include both a domain-specific private dataset and a non-specific domain public dataset containing users’ large reviews from Amazon, IMDB and Yelp, and an Arabic Sentiment Tweets Dataset (ASTD). Our overall performance evaluation shows that LSTM-RNN outperforms the literature’s review for both English and Arabic datasets. It achieves an accuracy of 0.98, an F1-Score of 0.97, a precision of 0.98 and a recall of 0.97 on the English Twitter dataset; an accuracy of 0.92, an F1-Score of 0.91, a precision of 0.89 and a recall of 0.93 on the French Twitter dataset; and an accuracy of 0.83, an F1-Score of 0.82, a precision of 0.87 and a recall of 0.79 on the Arabic Twitter dataset.

Full Text