Abstract

Social media and in particular, microblogs are becoming an important data source for disease surveillance, behavioral medicine, and public healthcare. Topic Models are widely used in microblog analytics for analyzing and integrating the textual data within a corpus. This paper uses health tweets as microblogs and attempts the health data clustering by topic models. The traditional topic models, such as Latent Semantic Indexing (LSI), Probabilistic Latent Schematic Indexing (PLSI), Latent Dirichlet Allocation (LDA), Non-negative Matrix Factorization (NMF), and integer Joint NMF(intJNMF) methods are used for health data clustering; however, they are intractable to assess the number of health topic clusters. Proper visualizations are essential to extract the information from and identifying trends of data, as they may include thousands of documents and millions of words. For visualization of topic clouds and health tendency in the document collection, we present hybrid topic models by integrating traditional topic models with VAT. Proposed hybrid topic models viz., Visual Non-negative Matrix Factorization (VNMF), Visual Latent Dirichlet Allocation (VLDA), Visual Probabilistic Latent Schematic Indexing (VPLSI) and Visual Latent Schematic Indexing (VLSI) are promising methods for accessing the health tendency and visualization of topic clusters from benchmarked and Twitter datasets. Evaluation and comparison of hybrid topic models are presented in the experimental section for demonstrating the efficiency with different distance measures, include, Euclidean distance, cosine distance, and multi-viewpoint cosine similarity.

Highlights

  • Twitter, Facebook, and microblogs [21], [22], [23] reveals the opinions of public and assessment of this social data [13], [14] is an emerging need in the applications like topics detection [1], [4], product promotion in business [4], political predictions [6], and health recommendations [2], [6]

  • We aim to investigate which public health issues are discussed in social media and in particular Twitter, and we use both visual access tendency (VAT) and traditional topic models in the proposed hybrid framework to overcome the problem of health cluster tendency, these hybrid topic models are Visual Non-negative Matrix Factorization (VNMF), Visual Latent Dirichlet Allocation (VLDA), Visual Latent Schematic Indexing (VLSI), and Visual Probabilistic Latent Schematic Indexing (VPLSI)

  • Significance of these visual results stated that VNMF, VLSA, and VPLSA efficiently performed for detection of health topics cluster tendency in healthcare applications and observed that VLDA shows the less clarity of visual results when compared to other models

Read more

Summary

INTRODUCTION

Facebook, and microblogs [21], [22], [23] reveals the opinions of public and assessment of this social data [13], [14] is an emerging need in the applications like topics detection [1], [4], product promotion in business [4], political predictions [6], and health recommendations [2], [6]. We aim to investigate which public health issues are discussed in social media and in particular Twitter, and we use both VAT and traditional topic models in the proposed hybrid framework to overcome the problem of health cluster tendency, these hybrid topic models are VNMF, VLDA, VLSI, and VPLSI. In a multi-viewpoint cosine similarity based metric, we used many different viewpoints; objects assumed not to be in the same cluster Using this more accurate assessment of how close or distant a pair of points if we look at them from many different viewpoints and average of similarities www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol 10, No 11, 2019 measured relatively from the views of all other documents outside that cluster.

RELATED WORK OF TOPIC MODELS
Similarity and Clustering Documents
PROPOSED HYBRID TOPIC MODELS
Apply convergence for finding topic-document matrix V using
EXPERIMENTAL STUDY
The Architecture of Hybrid Topic Models
Datasets Description
Features of Hybrid Topic Algorithms Comparison
Topics Clouds Description
Assessment of Health Tendency
Performance Measures Evaluation
Convergence Study
Computational Complexity Analysis
Findings
CONCLUSION AND FUTURE WORK

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.