Trends in COVID-19 Publications: Streamlining Research Using NLP and LDA

Akash Gupta,Shrey Aeron,Himanshu Gupta,Anjali Agrawal

doi:10.2139/ssrn.3708327

Abstract

Research publications related to the novel coronavirus disease COVID-19 are rapidly growing in number. However, current online literature hubs, even with artificial intelligence, are inadequate for identifying the relative strength of research topics. Hence, we aimed to develop a comprehensive Latent Dirichlet Allocation (LDA) topic model using natural language processing (NLP) techniques, provide visualisations for temporal trends, and apply our methodology to improve existing online literature hubs.Using the search term “COVID”, abstracts were extracted from PubMed®, from January to July 2020 (N=16346). An LDA topic model was trained on 81% of abstracts. Weekly temporal trends were visualised as a heatmap on all abstracts. Then, we tested our methodology on over 23,000 abstracts gathered from January 2020 to September 2020 from LitCovid, a literature hub from the National Center for Biotechnology Information. We use our topic model to subdivide LitCovid’s eight categories into corresponding LDA topics.The optimised LDA topic model, created using PubMed® data, produced 25 comprehensive topics with no significant overlap. There were temporal changes for topics: prominence of “Mental Health” and “Socioeconomic Impact” increased, “Genome Sequence” decreased, and “Epidemiology” remained relatively constant. We identified inadequate representation of “Airborne Transmission Protection”. Importantly, research on masks and PPE is skewed towards clinical applications with a lack of population-based epidemiological research. Our methodology, when applied to LitCovid, identified important topics within each LitCovid category. For example, “Case Report” was split into topics such as “Pulmonary” and “Oncology” as well as the under-represented topics “Haematology” and “Gastroenterology”. Our work allows for comprehensive topic identification and intuitive visualisation of temporal trends in COVID-19 research. Implementation of the methodology complements existing online literature hubs and identifies underrepresented topics such as population-based studies on masks that may be of significant public interest.Funding Statement: None to declare.Declaration of Interests: There are no conflicts of interest.

Highlights

The COVID-19 outbreak was officially declared a pandemic by the World Health Organization in March 2020 [1]
To evaluate the temporal trends, we propose a novel method, which is applied to both PubMed R and LitCovid abstracts to produce an intuitive visualisation of the weekly temporal evolution of topic proportions
We provide a generalisable natural language processing (NLP) methodology to extract abstracts from PubMed R, create an optimised Latent Dirichlet Allocation (LDA) topic model, and visualise temporal trends

Summary

Introduction

The COVID-19 outbreak was officially declared a pandemic by the World Health Organization in March 2020 [1]. Latent Dirichlet Allocation (LDA) is an unsupervised topic modelling technique used to learn hidden topics within a corpus [2]. It assumes topics are a soft clustering of words and outputs two probability distributions: a distribution of topics in the corpus, and distributions of words across each topic. Current online literature hubs, even with artificial intelligence, are limited in identifying the complexity of COVID-19 research topics. We developed a comprehensive Latent Dirichlet Allocation (LDA) model with 25 topics using natural language processing (NLP) techniques on PubMed® research articles about “COVID.” We propose a novel methodology to develop and visualise temporal trends, and improve existing online literature hubs. Our topic model demonstrates that research on “masks” and “Personal Protective Equipment (PPE)” is skewed toward clinical applications with a lack of population-based epidemiological research

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: SSRN Electronic Journal	Publication Date: Jan 1, 2020
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

Trends in COVID-19 Publications: Streamlining Research Using NLP and LDA

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: SSRN Electronic Journal

Lead the way for us

Similar Papers

An intelligent literature review: adopting inductive approach to define machine learning applications in the clinical domain
Renu Sabharwal ... Shah J Miah
Journal of Big Data | VOL. 9
Renu Sabharwal, et. al.Renu Sabharwal ... Shah J Miah
28 Apr 2022
Journal of Big Data | VOL. 9

Trends in COVID-19 Publications: Streamlining Research Using NLP and LDA.
Akash Gupta ... Shrey Aeron
Frontiers in Digital Health | VOL. 3
Akash Gupta, et. al.Akash Gupta ... Shrey Aeron
06 Jul 2021
Frontiers in Digital Health | VOL. 3

Sentiment Analysis of Consumer-Generated Online Reviews of Physical Bookstores Using Hybrid LSTM-CNN and LDA Topic Model
Yan Wang ... Xiaoyu Chang
-
Yan Wang, et. al.Yan Wang ... Xiaoyu Chang
01 Oct 2020
01 Oct 2020

Evaluating the Coverage and Depth of Latent Dirichlet Allocation Topic Model in Comparison with Human Coding of Qualitative Data: The Case of Education Research
Gaurav Nanda ... Yuzhe Zhou
Machine Learning and Knowledge Extraction | VOL. 5
Gaurav Nanda, et. al.Gaurav Nanda ... Yuzhe Zhou
14 May 2023
Machine Learning and Knowledge Extraction | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Trends in COVID-19 Publications: Streamlining Research Using NLP and LDA

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: SSRN Electronic Journal