Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis.

Rania Albalawi,Tet Hin Yeap,Morad Benyoucef

doi:10.3389/frai.2020.00042

Rania Albalawi, Tet Hin Yeap + Show 1 more

Open Access

https://doi.org/10.3389/frai.2020.00042

Copy DOI

Journal: Frontiers in artificial intelligence	Publication Date: Jul 14, 2020
Citations: 155	License type: CC BY 4.0

Affiliation: University of Ottawa

Abstract

With the growth of online social network platforms and applications, large amounts of textual user-generated content are created daily in the form of comments, reviews, and short-text messages. As a result, users often find it challenging to discover useful information or more on the topic being discussed from such content. Machine learning and natural language processing algorithms are used to analyze the massive amount of textual social media data available online, including topic modeling techniques that have gained popularity in recent years. This paper investigates the topic modeling subject and its common application areas, methods, and tools. Also, we examine and compare five frequently used topic modeling methods, as applied to short textual social data, to show their benefits practically in detecting important topics. These methods are latent semantic analysis, latent Dirichlet allocation, non-negative matrix factorization, random projection, and principal component analysis. Two textual datasets were selected to evaluate the performance of included topic modeling methods based on the topic quality and some standard statistical evaluation metrics, like recall, precision, F-score, and topic coherence. As a result, latent Dirichlet allocation and non-negative matrix factorization methods delivered more meaningful extracted topics and obtained good results. The paper sheds light on some common topic modeling methods in a short-text context and provides direction for researchers who seek to apply these methods.

Highlights

People nowadays tend to rely heavily on the internet in their daily social and commercial activities
We investigate select topic modeling (TM) methods that are commonly used in text mining, namely, latent Dirichlet allocation (LDA), latent semantic analysis (LSA), non-negative matrix factorization (NMF), principal component analysis (PCA), and random projection (RP)
online social networks (OSNs) include a huge amount of user-generated content (UGC) with many irrelevant and noisy data, such as non-meaningful, inappropriate data and symbols that need to be filtered before applying any text analysis techniques

Summary

Introduction

People nowadays tend to rely heavily on the internet in their daily social and commercial activities. There is a need for more efficient methods and tools that can aid in detecting and analyzing content in online social networks (OSNs), for those using user-generated content (UGC) as a source of data. There is a need to extract more useful and hidden information from numerous online sources that are stored as text and written in natural language within the social network landscape (e.g., Twitter, LinkedIn, and Facebook)

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in artificial intelligence

Lead the way for us

Similar Papers

Evaluation of clustering and topic modeling methods over health-related tweets and emails
Juan Antonio Lossio-Ventura ... Jiang Bian
Artificial Intelligence In Medicine | VOL. 117
Juan Antonio Lossio-Ventura, et. al.Juan Antonio Lossio-Ventura ... Jiang Bian
07 May 2021
Artificial Intelligence In Medicine | VOL. 117

Improve topic modeling algorithms based on Twitter hashtags
Hayder M Alash ... Ghaidaa A Al-Sultany
Journal of Physics: Conference Series | VOL. 1660
Hayder M Alash, et. al.Hayder M Alash ... Ghaidaa A Al-Sultany
01 Nov 2020
Journal of Physics: Conference Series | VOL. 1660

Investigating topic modeling techniques through evaluation of topics discovered in short texts data across diverse domains
R Muthusami ... N Mani Kandan
Scientific Reports | VOL. 14
R Muthusami, et. al.R Muthusami ... N Mani Kandan
25 May 2024
Scientific Reports | VOL. 14

A Comparison of Latent Semantic Analysis and Latent Dirichlet Allocation in Educational Measurement
Jordan M Wheeler ... Shiyu Wang
Journal of educational and behavioral statistics : a quarterly publication sponsored by the American Educational Research Association and the American Statistical Association | VOL. -
Jordan M Wheeler, et. al.Jordan M Wheeler ... Shiyu Wang
27 Nov 2023
27 Nov 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in artificial intelligence