Effective Preprocessing and Normalization Techniques for COVID-19 Twitter Streams with POS Tagging via Lightweight Hidden Markov Model

Senthil Kumar Narayanasamy,Kathiravan Srinivasan,Saeed Mian Qaisar,Yuh-Chung Hu

doi:10.1155/2022/1222692

Senthil Kumar Narayanasamy, Kathiravan Srinivasan + Show 2 more

Open Access

https://doi.org/10.1155/2022/1222692

Copy DOI

Abstract

The major focus of this research work is to refine the basic preprocessing steps for the unstructured text content and retrieve the potential conceptual features for further enhancement processes such as semantic enrichment and named entity recognition. Although some of the preprocessing techniques such as text tokenization, normalization, and Part-of-Speech (POS) tagging work exceedingly well on formal text, it has not performed well when it is applied into informal text such as tweets and short messages. Hence, we have given the enhanced text normalization techniques to reduce the complexity persist over the twitter streams and eliminate the overfitting issues such as text anomalies and irregular boundaries while fixing the grammar of the text. The hidden Markov model (HMM) has been pervasively used to extract the core lexical features from the Twitter dataset and suitably adapt the external documents to supplement the extraction techniques to complement the tweet context. Using this Markov process, the POS tags are identified as states of the Markov process, and words are the desired results of the model. As this process is very crucial for the next stage of entity extraction and classification, the effective handling of informal text is considered to be important and therefore proposed the most effective hybrid approach to deal with the issues appropriately.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Sensors	Publication Date: Aug 2, 2022
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Effective Preprocessing and Normalization Techniques for COVID-19 Twitter Streams with POS Tagging via Lightweight Hidden Markov Model

Abstract

Talk to us

Similar Papers

More From: Journal of Sensors

Lead the way for us

Similar Papers

Web-Based Bengali News Corpus for Lexicon Development and POS Tagging
Asif Ekbal ... Sivaji Bandyopadhyay
Polibits | VOL. 37
Asif Ekbal, et. al.Asif Ekbal ... Sivaji Bandyopadhyay
30 Jun 2008
Polibits | VOL. 37

An approach to reduce part of speech ambiguity using semantically annotated lexicon definitions
Andrei Minca ... Stefan Diaconescu
-
Andrei Minca, et. al.Andrei Minca ... Stefan Diaconescu
01 Sep 2012
01 Sep 2012

An Approach to Reduce Part of Speech Ambiguity Using Semantically Annotated Lexicon Definitions
Andrei Minc ... Tefan Diaconescu
-
Andrei Minc, et. al.Andrei Minc ... Tefan Diaconescu
01 Jan 2013
01 Jan 2013

InaNLP: Indonesia natural language processing toolkit, case study: Complaint tweet classification
Ayu Purwarianti ... Irfan Afif
-
Ayu Purwarianti, et. al.Ayu Purwarianti ... Irfan Afif
01 Aug 2016
01 Aug 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Effective Preprocessing and Normalization Techniques for COVID-19 Twitter Streams with POS Tagging via Lightweight Hidden Markov Model

Abstract

Talk to us

Similar Papers

More From: Journal of Sensors