Hybrid model of unsupervised and supervised learning for multiclass sentiment analysis based on users’ reviews on healthcare web forums

Anuj Kumar,Shashi Shekhar

doi:10.32629/jai.v7i4.971

Abstract

<p>Twitter has become a popular platform for sharing health information, including diabetes-related content. Recent research studies have shown that Twitter data can be used for various purposes such as monitoring illnesses, promoting health, analyzing sentiment, and potentially aiding in medical directing. However, detecting fitness-related tweets in the vast amount of data on Twitter can be difficult. This pilot study, therefore, aimed to classify patient text about drugs and disease-associated tweets into meaningful health-related segments. The unlabeled dataset is divided into several groups using an unsupervised learning technique called K-Means Clustering, using this first label the text and followed by a combination of neural networks and machine learning classifiers, they classified 32046 diabetes-related tweets and 161290 drug text lines into five groups. Approximately 66.38% of drug line text was classified as health-related, with 55.14% “treatment and medication”, 7.10% “prevention” and 4.14% “symptoms and causes”. Over 33% were categorized as “Other and News”. If we talk about the tweets as a dataset then the tweet was classified as health-related, with 44.30% “treatment and medication”, 7% “prevention” and 5.3% “symptoms and causes”. Over 56.10% were categorized as “Other and News. After this multiclass classification, we applied three machine learning and two deep learning models to find accuracy, precision, recall, and F1 scores. Drug review was used as a dataset then SVM and LR models provided an accuracy of 98% and when tweets were used as a dataset then LR models provided an accuracy of 97%. This research shows the importance of social media data in the decision-making system in the healthcare domain.</p>

Full Text