A Hybrid Feature Extraction Method for Nepali COVID-19-Related Tweets Classification

T.B Shahi,N Paudel,C Sitaula,Thippa Reddy G

doi:10.1155/2022/5681574

Abstract

COVID-19 is one of the deadliest viruses, which has killed millions of people around the world to this date. The reason for peoples' death is not only linked to its infection but also to peoples' mental states and sentiments triggered by the fear of the virus. People's sentiments, which are predominantly available in the form of posts/tweets on social media, can be interpreted using two kinds of information: syntactical and semantic. Herein, we propose to analyze peoples' sentiment using both kinds of information (syntactical and semantic) on the COVID-19-related twitter dataset available in the Nepali language. For this, we, first, use two widely used text representation methods: TF-IDF and FastText and then combine them to achieve the hybrid features to capture the highly discriminating features. Second, we implement nine widely used machine learning classifiers (Logistic Regression, Support Vector Machine, Naive Bayes, K-Nearest Neighbor, Decision Trees, Random Forest, Extreme Tree classifier, AdaBoost, and Multilayer Perceptron), based on the three feature representation methods: TF-IDF, FastText, and Hybrid. To evaluate our methods, we use a publicly available Nepali-COVID-19 tweets dataset, NepCov19Tweets, which consists of Nepali tweets categorized into three classes (Positive, Negative, and Neutral). The evaluation results on the NepCOV19Tweets show that the hybrid feature extraction method not only outperforms the other two individual feature extraction methods while using nine different machine learning algorithms but also provides excellent performance when compared with the state-of-the-art methods.

Highlights

Natural language processing (NLP) techniques have been developed to assess peoples’ sentiments on various topics
We choose nine widely used machine learning classifiers: Logistic Regression (LR), Random Forest (RF), Naive Bayes (NB), K-Nearest Neighbour (KNN), Decision Tree (DT), Extra Tree Classifier (ETC), Adaptive Boosting (AdaBoost), Multilayer Perceptron-Neural network (MLP-NN), and Support Vector Machine (SVM). e selection of classifiers in this study is made based on their abilities to impart the promising classification accuracy of both Nepali and non-Nepali document analysis [1,7,25] in the literature. e short description of each classifier is presented in the following paragraphs
We have proposed to use hybrid features (FastText + TermFrequency and Inverse Document Frequency (TF-IDF)) to represent Nepali COVID-19-related tweets for the sentiment classification

Summary

Introduction

Natural language processing (NLP) techniques have been developed to assess peoples’ sentiments on various topics. Recent works [1–8] on COVID-19 tweets sentiment analysis in English and other languages [8] underscore the efficacy of data-driven machine learning approaches, where they employed several kinds of analysis such as topic modeling, classification, and clustering. This urges the thorough comparison of machine learning methods in sentiment analysis with the better representation of tweets for sentiment classification They used popular feature extraction methods such as TF-IDF Frequency-Inverse and Document Frequency) and word embedding methods such as word2vec [9], Glove [10], and FastText [11] With such existing works, we listed three main limitations on Nepali COVID-19-related tweet representation and classification. There is no study on a detailed comparison of machine learning (ML) methods for the sentiment classification on the COVID-19-related tweets dataset, in the Nepali language.

Related Works

Proposed Approach

Preprocessing

TF-IDF Feature Extraction

Word Embedding Feature Extraction

Feature Fusion

Classification

K-Nearest

Support Vector

Experiment and Analysis

Evaluation Metrics

Implementation

Comparative Study of ML Classifiers on ree Different Features

Class-Wise Study of Classifiers’ Performance on Hybrid Features

Comparison of Our Method with the State-of-the-Art Methods

Conclusion and Future Works

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computational Intelligence and Neuroscience	Publication Date: Mar 9, 2022
Citations: 44	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Hybrid Feature Extraction Method for Nepali COVID-19-Related Tweets Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computational Intelligence and Neuroscience

Lead the way for us

Similar Papers

Machine learning application in Glioma classification: review and comparison analysis
Kirti Raj Bhatele ... Sarita Singh Bhadauria
Archives of computational methods in engineering : state of the art reviews | VOL. 29
Kirti Raj Bhatele, et. al.Kirti Raj Bhatele ... Sarita Singh Bhadauria
09 Apr 2021
Archives of computational methods in engineering : state of the art reviews | VOL. 29

Comparison of Classification Success Rates of Different Machine Learning Algorithms in the Diagnosis of Breast Cancer.
Irem Ozcan ... Ali Cetinkaya
Asian Pacific journal of cancer prevention : APJCP | VOL. 23
Irem Ozcan, et. al.Irem Ozcan ... Ali Cetinkaya
01 Oct 2022
Asian Pacific journal of cancer prevention : APJCP | VOL. 23

Data augmentation and hybrid feature amalgamation to detect audio deep fake attacks
Nidhi Chakravarty ... Mohit Dua
Physica Scripta T | VOL. 98
Nidhi Chakravarty, et. al.Nidhi Chakravarty ... Mohit Dua
03 Aug 2023
Physica Scripta T | VOL. 98

Comparison and Evaluation of Machine Learning-Based Classification of Hand Gestures Captured by Inertial Sensors
Ivo Stančić ... Mirela Kundid Vasić
Computation | VOL. 10
Ivo Stančić, et. al.Ivo Stančić ... Mirela Kundid Vasić
14 Sep 2022
Computation | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Hybrid Feature Extraction Method for Nepali COVID-19-Related Tweets Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computational Intelligence and Neuroscience