CitationLDA++

Thuc Nguyen,Phuc Do

doi:10.1145/3287921.3287930

Abstract

Along with rapid development of electronic scientific publication repositories, automatic topics identification from papers has helped a lot for the researchers in their research. Latent Dirichlet Allocation (LDA) model is the most popular method which is used to discover hidden topics in texts basing on the co-occurrence of words in a corpus. LDA algorithm has achieved good results for large documents. However, article repositories usually only store title and abstract that are too short for LDA algorithm to work effectively. In this paper, we propose CitationLDA++ model that can improve the performance of the LDA algorithm in inferring topics of the papers basing on the title or/and abstract and citation information. The proposed model is based on the assumption that the topics of the cited papers also reflects the topics of the original paper. In this study, we divide the dataset into two sets. The first one is used to build prior knowledge source using LDA algorithm. The second is training dataset used in CitationLDA++. In the inference process with Gibbs sampling, CitationLDA++ algorithm use topics distribution of prior knowledge source and citation information to guide the process of assigning the topic to words in the text. The use of topics of cited papers helps to tackle the limit of word co-occurrence in case of linked short text. Experiments with the AMiner dataset including title or/and abstract of papers and citation information, CitationLDA++ algorithm gains better perplexity measurement than no additional knowledge. Experimental results suggest that the citation information can improve the performance of LDA algorithm to discover topics of papers in the case of full content of them are not available.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

CitationLDA++

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Comparative Study on Perceived Trust of Topic Modeling Based on Affective Level of Educational Text
Youngjae Im ... Kijung Park
Applied Sciences | VOL. 9
Youngjae Im, et. al.Youngjae Im ... Kijung Park
28 Oct 2019
Applied Sciences | VOL. 9

A corporal and LDA analysis of abstracts of academic conference papers
Sebastien Louvigne ... Neil Rubens
-
Sebastien Louvigne, et. al.Sebastien Louvigne ... Neil Rubens
01 Sep 2013
01 Sep 2013

An effective hot topic detection method for microblog on spark
Wei Ai ... Keqin Li
Applied Soft Computing | VOL. 70
Wei Ai, et. al.Wei Ai ... Keqin Li
07 Oct 2017
Applied Soft Computing | VOL. 70

Semantic clustering method using integration of advanced LDA algorithm and BERT algorithm
Volodymyr Narozhnyi ... Vyacheslav Kharchenko
INNOVATIVE TECHNOLOGIES AND SCIENTIFIC SOLUTIONS FOR INDUSTRIES | VOL. -
Volodymyr Narozhnyi, et. al.Volodymyr Narozhnyi ... Vyacheslav Kharchenko
02 Jul 2024
INNOVATIVE TECHNOLOGIES AND SCIENTIFIC SOLUTIONS FOR INDUSTRIES | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CitationLDA++

Abstract

Talk to us

Similar Papers