Analysis and tuning of hierarchical topic models based on Renyi entropy approach.

Sergei Koltcov,Maxim Terpilovskii,Vera Ignatenko,Paolo Rosso

doi:10.7717/peerj-cs.608

Sergei Koltcov, Maxim Terpilovskii + Show 2 more

Open Access

https://doi.org/10.7717/peerj-cs.608

Copy DOI

Abstract

Hierarchical topic modeling is a potentially powerful instrument for determining topical structures of text collections that additionally allows constructing a hierarchy representing the levels of topic abstractness. However, parameter optimization in hierarchical models, which includes finding an appropriate number of topics at each level of hierarchy, remains a challenging task. In this paper, we propose an approach based on Renyi entropy as a partial solution to the above problem. First, we introduce a Renyi entropy-based metric of quality for hierarchical models. Second, we propose a practical approach to obtaining the “correct” number of topics in hierarchical topic models and show how model hyperparameters should be tuned for that purpose. We test this approach on the datasets with the known number of topics, as determined by the human mark-up, three of these datasets being in the English language and one in Russian. In the numerical experiments, we consider three different hierarchical models: hierarchical latent Dirichlet allocation model (hLDA), hierarchical Pachinko allocation model (hPAM), and hierarchical additive regularization of topic models (hARTM). We demonstrate that the hLDA model possesses a significant level of instability and, moreover, the derived numbers of topics are far from the true numbers for the labeled datasets. For the hPAM model, the Renyi entropy approach allows determining only one level of the data structure. For hARTM model, the proposed approach allows us to estimate the number of topics for two levels of hierarchy.

Highlights

The large flow of news generated by TV channels, electronic news sources and social media is very often represented as a hierarchical system
We investigate the behavior of three hierarchical models, namely, hierarchical latent Dirichlet allocation (Blei et al, 2003), hierarchical Pachinko allocation (Mimno, Li & McCallum, 2007), and hierarchical additive regularization of topic models (Chirkova & Vorontsov, 2016), in terms of two metrics: log-likelihood and Renyi entropy
HLDA model is very unstable which means that its different runs with the same parameters produce radically different topical structures of the same data

Summary

Introduction

The large flow of news generated by TV channels, electronic news sources and social media is very often represented as a hierarchical system In such a system, news items or messages are divided into a number of global topics, such as politics, sports, or health. Such models have a set of parameters, which need to be tuned to obtain a topical solution of higher quality. An analysis and discussion of topic models instability can be found in work (Koltsov et al, 2016) This problem complicates the search for optimal model hyperparameters on a given dataset. Investigation and assessment of the ability to tune hierarchical topic models is an important task

Objectives

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PeerJ Computer Science	Publication Date: Jul 29, 2021
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Analysis and tuning of hierarchical topic models based on Renyi entropy approach.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ Computer Science

Lead the way for us

Similar Papers

On the Role of Semantic Word Clusters — CluWords — in Natural Language Processing (NLP) Tasks
Felipe Viegas ... Leonardo Rocha
-
Felipe Viegas, et. al.Felipe Viegas ... Leonardo Rocha
21 Jul 2024
21 Jul 2024

Scalable training of hierarchical topic models
Jianfei Chen ... Jie Lu
Proceedings of the VLDB Endowment | VOL. 11
Jianfei Chen, et. al.Jianfei Chen ... Jie Lu
01 Mar 2018
Proceedings of the VLDB Endowment | VOL. 11

Hierarchical lifelong topic modeling using rules extracted from network communities.
Muhammad Taimoor Khan ... Jerry Chun-Wei Lin
PloS one | VOL. 17
Muhammad Taimoor Khan, et. al.Muhammad Taimoor Khan ... Jerry Chun-Wei Lin
03 Mar 2022
PloS one | VOL. 17

Bayesian nonparametric inference of latent topic hierarchies for multimodal data
Takuji Shimamawari ... Atsuhiro Takasu
-
Takuji Shimamawari, et. al.Takuji Shimamawari ... Atsuhiro Takasu
01 Nov 2015
01 Nov 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Analysis and tuning of hierarchical topic models based on Renyi entropy approach.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ Computer Science