Enhancing Big Social Media Data Quality for Use in Short-Text Topic Modeling

Belal Abdullah Hezam Murshed,Sumaia Mohammed Al-Ghuribi,Suresha Mallappa,Mufeed Ahmed Naji Saif,Fahd A Ghanem,Jemal Abawajy

doi:10.1109/access.2022.3211396

Belal Abdullah Hezam Murshed, Sumaia Mohammed Al-Ghuribi + Show 4 more

Open Access

https://doi.org/10.1109/access.2022.3211396

Copy DOI

Abstract

With the emergence of microblogging platforms and social media applications, large amounts of user-generated data in the form of comments, reviews, and brief text messages are produced every day. Microblog data is typically of poor quality; hence improving the quality of the data is a significant scientific and practical challenge. In spite of the relevance of the problem, there has been not much work so far, especially in regard to microblog data quality for Short-Text Topic Modelling (STTM) purposes. This paper addresses this problem and proposes an approach called the social media data cleansing model (SMDCM) to improve data quality for STTM. We evaluate SMDCM using six topic modelling methods, namely the Latent Dirichlet Allocation (LDA), Word-Network Topic Model (WNTM), Pseudo-document-based Topic Modelling (PTM), Biterm Topic Model (BTM), Global and Local word embedding-based Topic Modeling (GLTM), and Fuzzy Topic modelling (FTM). We used the Real-world Cyberbullying Twitter (RW-CB-Twitter) and the Cyberbullying Mendeley (CB-MNDLY) datasets in the evaluation. The results proved the efficiency of the GLTM and WNTM over the other STTM models when applying the SMDCM techniques, which achieved optimum topic coherence and high accuracy values on RW-CB-Twitter and CB-MNDLY datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2022
Citations: 10	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Enhancing Big Social Media Data Quality for Use in Short-Text Topic Modeling

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Evaluation of clustering and topic modeling methods over health-related tweets and emails
Juan Antonio Lossio-Ventura ... Jiang Bian
Artificial Intelligence in Medicine | VOL. 117
Juan Antonio Lossio-Ventura, et. al.Juan Antonio Lossio-Ventura ... Jiang Bian
07 May 2021
Artificial Intelligence in Medicine | VOL. 117

Deconstructing Domain Names to Reveal Latent Topics
Cheryl J Flynn ... Wei Wang
-
Cheryl J Flynn, et. al.Cheryl J Flynn ... Wei Wang
01 Oct 2016
01 Oct 2016

Contributions to Social Media Analysis Based on Topic Modelling
Giordanno Brunno Bergamini Gomes ... Romis Attux
-
Giordanno Brunno Bergamini Gomes, et. al.Giordanno Brunno Bergamini Gomes ... Romis Attux
26 Sep 2023
26 Sep 2023

BTM: Topic Modeling over Short Texts
Xueqi Cheng ... Yanyan Lan
IEEE Transactions on Knowledge and Data Engineering | VOL. 26
Xueqi Cheng, et. al.Xueqi Cheng ... Yanyan Lan
01 Dec 2014
IEEE Transactions on Knowledge and Data Engineering | VOL. 26

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhancing Big Social Media Data Quality for Use in Short-Text Topic Modeling

Abstract

Talk to us

Similar Papers

More From: IEEE Access