N-Gram based language processing using Twitter dataset to identify COVID-19 patients

Nidal Nasser,Lutful Karim,Ahmed El Ouadrhiri,Asmaa Ali,Nargis Khan

doi:10.1016/j.scs.2021.103048

Nidal Nasser, Lutful Karim + Show 3 more

Open Access

https://doi.org/10.1016/j.scs.2021.103048

Copy DOI

Abstract

Due to the rapid growth of electronic documents, e.g., tweets, blogs, Facebook posts, snaps in different languages that use the same writing script, language categorization, and processing have great importance. For instance, to identify COVID-19 positive patients or people’s emotions on COVID-19 pandemic from tweets written in 35 different languages faster and accurate, language categorization and processing of tweets is significantly essential. Among many language categorization and processing techniques, character and word n-gram based techniques are very popular and simple but very efficient for categorizing and processing both short and large documents. One of the fundamental problems of language processing is the efficient use of memory space in implementing a technique so that a vast collection of documents can be easily categorized and processed. In this paper, we introduce a framework that categorizes the language of tweets using n-gram based language categorization technique and further processes the tweets using the machine-learning approach, Linear Support Vector Machine (LSVM), that may be able to identify COVID-19 positive patients. We evaluate and compare the performance of the proposed framework in terms of language categorization accuracy, precession, recall, and F-measure over n-gram length. The proposed framework is scalable as many other applications that involve extracting features and classifying languages collected from social media, and different types of networks may use this framework. This proposed framework, also being a part of health monitoring and improvement, tends to achieve the goal of having a sustainable society.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sustainable Cities and Society	Publication Date: May 25, 2021
Citations: 14	License type: NO-CC CODE

R Discovery Prime

R Discovery Prime

N-Gram based language processing using Twitter dataset to identify COVID-19 patients

Abstract

Talk to us

Similar Papers

More From: Sustainable Cities and Society

Lead the way for us

Similar Papers

A Review on Urdu Language Parsing
Arslan Ali ... Muhammad Javed
International Journal of Advanced Computer Science and Applications | VOL. 8
Arslan Ali, et. al.Arslan Ali ... Muhammad Javed
01 Jan 2017
International Journal of Advanced Computer Science and Applications | VOL. 8

Guest Editors Introduction: Machine Learning in Speech and Language Technologies
Pascale Fung ... Dan Roth
Machine Learning | VOL. 60
Pascale Fung, et. al.Pascale Fung ... Dan Roth
01 Sep 2005
Machine Learning | VOL. 60

Classification of sEMG Signal-Based Arm Action Using Convolutional Neural Network
C N Savithri ... E Priya
-
C N Savithri, et. al.C N Savithri ... E Priya
22 Sep 2020
22 Sep 2020

Assessing kiwifruit quality in storage through machine learning
Mohsen Azadbakht ... Shaghayegh Hashemi Shabankareh
Journal of Food Process Engineering | VOL. 47
Mohsen Azadbakht, et. al.Mohsen Azadbakht ... Shaghayegh Hashemi Shabankareh
01 Jul 2024
Journal of Food Process Engineering | VOL. 47

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

N-Gram based language processing using Twitter dataset to identify COVID-19 patients

Abstract

Talk to us

Similar Papers

More From: Sustainable Cities and Society