Monolingual and multilingual topic analysis using LDA and BERT embeddings

Qing Xie,Xinyuan Zhang,Ying Ding,Min Song

doi:10.1016/j.joi.2020.101055

Abstract

Analyzing research topics offers potential insights into the direction of scientific development. In particular, analyzing multilingual research topics can help researchers grasp the evolution of topics globally, revealing topic similarity among scientific publications written in different languages. Most studies to date on topic analysis have been based on English-language publications and have relied heavily on citation-based topic evolution analysis. However, since it can be challenging for English publications to cite non-English sources and since many languages do not offer English translations of abstracts, citation-based methodologies are not suitable for analyzing multilingual research topic relations. Since multilingual sentence embeddings can effectively preserve word semantics in multilingual translation tasks, a topic model based on multilingual sentence embeddings could potentially generate topic–word distributions for publications in multilingual analysis. In this paper, which is situated in the field of library and information science, we use multilingual pretrained Bidirectional Encoder Representations from Transformers (BERT) embeddings and the Latent Dirichlet Allocation (LDA) topic model to analyze topic evolution in monolingual and multilingual topic similarity settings. For each topic, we multiply its LDA probability value by the averaged tensor similarity of BERT embeddings to explore the evolution of the topic in scientific publications. As our proposed method does not rely on a machine translator or the author's subjective translation, it avoids confusion and misusages caused by either machine error or the author's subjectively chosen English keywords. Our results show that the proposed approach is well-suited to analyzing the scientific evolutions in monolingual and scientific multilingual topic similarity relations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Monolingual and multilingual topic analysis using LDA and BERT embeddings

Abstract

Talk to us

Similar Papers

More From: Journal of Informetrics

Lead the way for us

Journal: Journal of Informetrics	Publication Date: Jun 25, 2020
Citations: 44

Similar Papers

An integrated clustering and BERT framework for improved topic modeling.
Lijimol George ... P Sumathy
International Journal of Information Technology | VOL. 15
Lijimol George, et. al.Lijimol George ... P Sumathy
01 Apr 2023
International Journal of Information Technology | VOL. 15

Semantic Topic Extraction from Bangla News Corpus Using LDA and BERT-LDA
Pintu Chandra Paul ... Md Tofael Ahmed
-
Pintu Chandra Paul, et. al.Pintu Chandra Paul ... Md Tofael Ahmed
17 Dec 2022
17 Dec 2022

Comparing human coding to two natural language processing algorithms in aspirations of people affected by Duchenne Muscular Dystrophy
Carolyn E Schwartz ... Elijah Biletch
Journal of Methods and Measurement in the Social Sciences | VOL. 13
Carolyn E Schwartz, et. al.Carolyn E Schwartz ... Elijah Biletch
01 Oct 2022
Journal of Methods and Measurement in the Social Sciences | VOL. 13

An intelligent literature review: adopting inductive approach to define machine learning applications in the clinical domain
Renu Sabharwal ... Shah J Miah
Journal of Big Data | VOL. 9
Renu Sabharwal, et. al.Renu Sabharwal ... Shah J Miah
28 Apr 2022
Journal of Big Data | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Monolingual and multilingual topic analysis using LDA and BERT embeddings

Abstract

Talk to us

Similar Papers

More From: Journal of Informetrics