Abstract

Since the beginning of the COVID-19 pandemic almost two years ago, there have been more than 700,000 scientific papers published on the subject. An individual researcher cannot possibly get acquainted with such a huge text corpus and, therefore, some help from artificial intelligence (AI) is highly needed. We propose the AI-based tool to help researchers navigate the medical papers collections in a meaningful way and extract some knowledge from scientific COVID-19 papers. The main idea of our approach is to get as much semi-structured information from text corpus as possible, using named entity recognition (NER) with a model called PubMedBERT and Text Analytics for Health service, then store the data into NoSQL database for further fast processing and insights generation. Additionally, the contexts in which the entities were used (neutral or negative) are determined. Application of NLP and text-based emotion detection (TBED) methods to COVID-19 text corpus allows us to gain insights on important issues of diagnosis and treatment (such as changes in medical treatment over time, joint treatment strategies using several medications, and the connection between signs and symptoms of coronavirus, etc.).

Highlights

  • Automatic scientific paper analysis is a fast-growing area of study

  • Since the beginning of COVID-19 pandemic almost two years ago, there have been more than 700,000 scientific papers published on the subject

  • At the very beginning of COVID-19 pandemic, a research challenge has been launched on Kaggle to analyze scientific papers on the subject

Read more

Summary

Introduction

Automatic scientific paper analysis is a fast-growing area of study. In recent years, there has been huge progress in the field of natural language processing (NLP), and very powerful neural network language models have been trained. At the very beginning of COVID-19 pandemic, a research challenge has been launched on Kaggle to analyze scientific papers on the subject The dataset behind this competition is called CORD, and it contains a constantly updated corpus of everything that is published on topics related to COVID-19 [7]. Resources for text mining researchers and practitioners include pretrained COVID-19 domain language models, knowledge graphs, and embeddings [12]. The proposed architecture uses NoSQL database to store oefn tity-relation metadata, which allows us to use DBMS SQL-based querying to perform semantically rich queries over text corpus. Another novelty of the proposed system is to aopvpelrytetexxttc-obrapseuds. Another novelty of the proposed system is to aopvpelrytetexxttc-obrapseuds. eAmnoottihoenr dneotveeclttiyonof(TthBeEpDr)opanodsekdnsoywstleemdgies tgoraapphpslytotesxht-obwascehdaenmgeostioinn mdeetdeicctaiol ntre(TatBmEeDn)taonvderktniomwelaedndgejogirnatpthresattomsehnotwstcrhataenggieess uinsimngedsiecvaelrtarleamtmedeincat toiovners.time and joint treatment strategies using several medications

Materials and Methods
COVID-19 Scientific Papers and CORD Dataset
Text Analytics
Treatment Strategy over Time
Findings
Terms Co-Occurrence
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.