Abstract

The latest threat to global health is the COVID-19 outbreak. Although there exist large datasets of chest X-rays (CXR) and computed tomography (CT) scans, few COVID-19 image collections are currently available due to patient privacy. At the same time, there is a rapid growth of COVID-19-relevant articles in the biomedical literature, including those that report findings on radiographs. Here, we present COVID-19-CT-CXR, a public database of COVID-19 CXR and CT images, which are automatically extracted from COVID-19-relevant articles from the PubMed Central Open Access (PMC-OA) Subset. We extracted figures, associated captions, and relevant figure descriptions in the article and separated compound figures into subfigures. Because a large portion of figures in COVID-19 articles are not CXR or CT, we designed a deep-learning model to distinguish them from other figure types and to classify them accordingly. The final database includes 1,327 CT and 263 CXR images (as of May 9, 2020) with their relevant text. To demonstrate the utility of COVID-19-CT-CXR, we conducted four case studies. (1) We show that COVID-19-CT-CXR, when used as additional training data, is able to contribute to improved deep-learning (DL) performance for the classification of COVID-19 and non-COVID-19 CT. (2) We collected CT images of influenza, another common infectious respiratory illness that may present similarly to COVID-19, and fine-tuned a baseline deep neural network to distinguish a diagnosis of COVID-19, influenza, or normal or other types of diseases on CT. (3) We fine-tuned an unsupervised one-class classifier from non-COVID-19 CXR and performed anomaly detection to detect COVID-19 CXR. (4) From text-mined captions and figure descriptions, we compared 15 clinical symptoms and 20 clinical findings of COVID-19 versus those of influenza to demonstrate the disease differences in the scientific publications. Our database is unique, as the figures are retrieved along with relevant text with fine-grained descriptions, and it can be extended easily in the future. We believe that our work is complementary to existing resources and hope that it will contribute to medical image analysis of the COVID-19 pandemic. The dataset, code, and DL models are publicly available at https://github.com/ncbi-nlp/COVID-19-CT-CXR.

Highlights

  • THE latest threat to global health is the ongoing outbreak of the COVID-19 caused by SARS-CoV-2 [1]

  • (1) We show that COVID-19-computed tomography (CT)-Chest X-rays (CXR), when used as additional training data, is able to contribute to improved deep-learning (DL) performance for the classification of COVID-19 and non-COVID-19 CT. (2) We collected CT images of influenza, another common infectious respiratory illness that may present to COVID-19, and fine-tuned a baseline deep neural network to distinguish a diagnosis of COVID-19, influenza, or normal or other types of diseases on CT. (3) We fine-tuned an unsupervised one-class classifier from non-COVID-19 CXR and performed anomaly detection to detect COVID-19 CXR. (4) From text-mined captions and figure descriptions, we compared 15 clinical symptoms and 20 clinical findings of COVID-19 versus those of influenza to demonstrate the disease differences in the scientific publications

  • PubMed Central Open Access (PMC-OA) is a well-known portion of the PubMed Central (PMC) articles under a Creative Commons license that allows for text mining, secondary analysis, and other types of reuse

Read more

Summary

Introduction

THE latest threat to global health is the ongoing outbreak of the COVID-19 caused by SARS-CoV-2 [1]. Chest X-rays (CXR) and chest computed tomography (CT) scans are playing a major part in the detection and monitoring of these respiratory manifestations. With the shortage of specialists who have been trained to accumulate experiences with COVID-19 diagnosis, there has been a concerted move toward the adoption of artificial intelligence (AI), deep-learning-based methods, in COVID-19 pandemic diagnosis and prognosis, in which well-annotated data always play a critical role [8]. There exist large public datasets of CXR [9], [10], [11] and CT [12], there are few collections of COVID-19 images to effectively train a deep neural network [13], [14], [15]. There is a recent COVID-19 initiative to expand access via PubMed Central Open Access (PMC-OA) Subset to coronavirusrelated publications and associated data

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call