Assessment of Twitter Data Clusters with Cosine-Based Validation Metrics Using Hybrid Topic Models

Noorullah R Mohammed,Moulana Mohammed

doi:10.18280/isi.250606

Abstract

Text data clustering is performed for organizing the set of text documents into the desired number of coherent and meaningful sub-clusters. Modeling the text documents in terms of topics derivations is a vital task in text data clustering. Each tweet is considered as a text document, and various topic models perform modeling of tweets. In existing topic models, the clustering tendency of tweets is assessed initially based on Euclidean dissimilarity features. Cosine metric is more suitable for more informative assessment, especially of text clustering. Thus, this paper develops a novel cosine based external and interval validity assessment of cluster tendency for improving the computational efficiency of tweets data clustering. In the experimental, tweets data clustering results are evaluated using cluster validity indices measures. Experimentally proved that cosine based internal and external validity metrics outperforms the other using benchmarked and Twitter-based datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Assessment of Twitter Data Clusters with Cosine-Based Validation Metrics Using Hybrid Topic Models

Abstract

Talk to us

Similar Papers

More From: Ingénierie des systèmes d information

Lead the way for us

Similar Papers

A Novel Cosine-based Internal and External Validation metrics to assess twitter Data Clustering using Hybrid Topic Models
R M Noorullah ... Moulana Mohammed
IOP Conference Series: Materials Science and Engineering | VOL. 981
R M Noorullah, et. al.R M Noorullah ... Moulana Mohammed
01 Dec 2020
IOP Conference Series: Materials Science and Engineering | VOL. 981

Comparative Studies on Some Metrics for External Validation of QSPR Models
Kunal Roy ... Supratik Kar
Journal of Chemical Information and Modeling | VOL. 52
Kunal Roy, et. al.Kunal Roy ... Supratik Kar
17 Jan 2012
Journal of Chemical Information and Modeling | VOL. 52

Jerry A Nathanson Basic Environmental Technology — Water Supply, Waste Management and Pollution Control 4 th ed 2002 Published by Prentice Hall of India Pvt Ltd. New Delhi 81-203-2228-2 Pages 532 Price Rs 325 (Indian Reprint)
Mp Cariappa
Medical Journal Armed Forces India | VOL. 60
Mp CariappaMp Cariappa
01 Apr 2004
Medical Journal Armed Forces India | VOL. 60

Computational modeling of aquatic toxicity of polychlorinated naphthalenes (PCNs) employing 2D-QSAR and chemical read-across
Aniket Nath ... Kunal Roy
Aquatic Toxicology | VOL. 257
Aniket Nath, et. al.Aniket Nath ... Kunal Roy
25 Feb 2023
Aquatic Toxicology | VOL. 257

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Assessment of Twitter Data Clusters with Cosine-Based Validation Metrics Using Hybrid Topic Models

Abstract

Talk to us

Similar Papers

More From: Ingénierie des systèmes d information