Traditional Topic Models Research Articles

With the rapid proliferation of social networking sites (SNS), automatic topic extraction from various text messages posted on SNS are becoming an important source of information for understanding current social trends or needs. Latent Dirichlet Allocation (LDA), a probabilistic generative model, is one of the popular topic models in the area of Natural Language Processing (NLP) and has been widely used in information retrieval, topic extraction, and document analysis. Unlike long texts from formal documents, messages on SNS are generally short. Traditional topic models such as LDA or pLSA (probabilistic latent semantic analysis) suffer performance degradation for short-text analysis due to a lack of word co-occurrence information in each short text. To cope with this problem, various techniques are evolving for interpretable topic modeling for short texts, pretrained word embedding with an external corpus combined with topic models is one of them. Due to recent developments of deep neural networks (DNN) and deep generative models, neural-topic models (NTM) are emerging to achieve flexibility and high performance in topic modeling. However, there are very few research works on neural-topic models with pretrained word embedding for generating high-quality topics from short texts. In this work, in addition to pretrained word embedding, a fine-tuning stage with an original corpus is proposed for training neural-topic models in order to generate semantically coherent, corpus-specific topics. An extensive study with eight neural-topic models has been completed to check the effectiveness of additional fine-tuning and pretrained word embedding in generating interpretable topics by simulation experiments with several benchmark datasets. The extracted topics are evaluated by different metrics of topic coherence and topic diversity. We have also studied the performance of the models in classification and clustering tasks. Our study concludes that though auxiliary word embedding with a large external corpus improves the topic coherency of short texts, an additional fine-tuning stage is needed for generating more corpus-specific topics from short-text data.

Read full abstract

BackgroundCOVID-19 is still rampant all over the world. Until now, the COVID-19 vaccine is the most promising measure to subdue contagion and achieve herd immunity. However, public vaccination intention is suboptimal. A clear division lies between medical professionals and laypeople. While most professionals eagerly promote the vaccination campaign, some laypeople exude suspicion, hesitancy, and even opposition toward COVID-19 vaccines.ObjectiveThis study aims to employ a text mining approach to examine expression differences and thematic disparities between the professionals and laypeople within the COVID-19 vaccine context.MethodsWe collected 3196 answers under 65 filtered questions concerning the COVID-19 vaccine from the China-based question and answer forum Zhihu. The questions were classified into 5 categories depending on their contents and description: adverse reactions, vaccination, vaccine effectiveness, social implications of vaccine, and vaccine development. Respondents were also manually coded into two groups: professional and laypeople. Automated text analysis was performed to calculate fundamental expression characteristics of the 2 groups, including answer length, attitude distribution, and high-frequency words. Furthermore, structural topic modeling (STM), as a cutting-edge branch in the topic modeling family, was used to extract topics under each question category, and thematic disparities were evaluated between the 2 groups.ResultsLaypeople are more prevailing in the COVID-19 vaccine–related discussion. Regarding differences in expression characteristics, the professionals posted longer answers and showed a conservative stance toward vaccine effectiveness than did laypeople. Laypeople mentioned countries more frequently, while professionals were inclined to raise medical jargon. STM discloses prominent topics under each question category. Statistical analysis revealed that laypeople preferred the “safety of Chinese-made vaccine” topic and other vaccine-related issues in other countries. However, the professionals paid more attention to medical principles and professional standards underlying the COVID-19 vaccine. With respect to topics associated with the social implications of vaccines, the 2 groups showed no significant difference.ConclusionsOur findings indicate that laypeople and professionals share some common grounds but also hold divergent focuses toward the COVID-19 vaccine issue. These incongruities can be summarized as “qualitatively different” in perspective rather than “quantitatively different” in scientific knowledge. Among those questions closely associated with medical expertise, the “qualitatively different” characteristic is quite conspicuous. This study boosts the current understanding of how the public perceives the COVID-19 vaccine, in a more nuanced way. Web-based question and answer forums are a bonanza for examining perception discrepancies among various identities. STM further exhibits unique strengths over the traditional topic modeling method in statistically testing the topic preference of diverse groups. Public health practitioners should be keenly aware of the cognitive differences between professionals and laypeople, and pay special attention to the topics with significant inconsistency across groups to build consensus and promote vaccination effectively.

Read full abstract

Traditional Topic Models Research Articles

Related Topics

Articles published on Traditional Topic Models

Investigating the Efficient Use of Word Embedding with Neural-Topic Models for Interpretable Topics from Short Texts.

Knowledge Topic-Structure Exploration for Online Innovative Knowledge Acquisition

Acceptable set topic modeling

Multi-viewpoints visual models for efficient modeling and analysis of Twitter based health-care services

Neural labeled LDA: a topic model for semi-supervised document classification

BATS: A Spectral Biclustering Approach to Single Document Topic Modeling and Segmentation

Exploring the Expression Differences Between Professionals and Laypeople Toward the COVID-19 Vaccine: Text Mining Approach.

Topic Modeling in Embedding Spaces for Depression Assessment

A New Sentence-Based Interpretative Topic Modeling and Automatic Topic Labeling

SenU-PTM: a novel phrase-based topic model for short-text topic discovery by exploiting word embeddings

Multimodal Weibull Variational Autoencoder for Jointly Modeling Image-Text Data.

Semantic-based topic representation using frequent semantic patterns

A multi-grained aspect vector learning model for unsupervised aspect identification

Visualization and performance measure to determine number of topics in twitter data clustering using hybrid topic modeling

A neural topic model with word vectors and entity vectors for short texts

Topic Modeling in Embedding Spaces

An event based topic learning pipeline for neuroimaging literature mining

An extractive text summarization approach using tagged-LDA based topic modeling

A Sparse Topic Model for Bursty Topic Discovery in Social Networks

Cluster analysis of urdu tweets

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Traditional Topic Models Research Articles

Related Topics

Articles published on Traditional Topic Models

Investigating the Efficient Use of Word Embedding with Neural-Topic Models for Interpretable Topics from Short Texts.

Knowledge Topic-Structure Exploration for Online Innovative Knowledge Acquisition

Acceptable set topic modeling

Multi-viewpoints visual models for efficient modeling and analysis of Twitter based health-care services

Neural labeled LDA: a topic model for semi-supervised document classification

BATS: A Spectral Biclustering Approach to Single Document Topic Modeling and Segmentation

Exploring the Expression Differences Between Professionals and Laypeople Toward the COVID-19 Vaccine: Text Mining Approach.

Topic Modeling in Embedding Spaces for Depression Assessment

A New Sentence-Based Interpretative Topic Modeling and Automatic Topic Labeling

SenU-PTM: a novel phrase-based topic model for short-text topic discovery by exploiting word embeddings

Multimodal Weibull Variational Autoencoder for Jointly Modeling Image-Text Data.

Semantic-based topic representation using frequent semantic patterns

A multi-grained aspect vector learning model for unsupervised aspect identification

Visualization and performance measure to determine number of topics in twitter data clustering using hybrid topic modeling

A neural topic model with word vectors and entity vectors for short texts

Topic Modeling in Embedding Spaces

An event based topic learning pipeline for neuroimaging literature mining

An extractive text summarization approach using tagged-LDA based topic modeling

A Sparse Topic Model for Bursty Topic Discovery in Social Networks

Cluster analysis of urdu tweets