Abstract
Among the techniques of text mining, topic modeling is considered one of the emerging tools to extract or detect hidden themes that lie within a huge collection of textual data. Latent Dirichlet Allocation (LDA) is considered a popular method in the field of topic modeling. This paper deals with topic modeling from 9130 articles of Sri Lankan authors having a minimum of 5 citations downloaded from the WoS database using LDA. The LDA tuning (R package) is used in the study to take various measurements for deciding subjects in light of factual elements. The top 10 latent topics were identified, and different unique terms associated with the topics were also discussed. Health is traced as the most occurring latent topic followed by forest and solar cells. Topic-1 (100%) Contains Water-related terms, which is around 60%; Irrigation and soilrelated were 40% (1997). This first topic was prominent across the period barring 1994 and 1996. Topic 3 has gradually decreased and Topic 9 has gradually increased during the last five decades. By comparing our results to traditional scholarship by Sri Lankan authors and the evolution of scientific publication by the island nation, we have shown that topic models can emerge as a scientific alternative to conventional classification systems.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have