Abstract
We introduce a generic, human-out-of-the-loop pipeline, ERLKG, to perform rapid association analysis of any biomedical entity with other existing entities from a corpora of the same domain. Our pipeline consists of a Knowledge Graph (KG) created from the Open Source CORD-19 dataset by fully automating the procedure of information extraction using SciBERT. The best latent entity representations are then found by benchnmarking different KG embedding techniques on the task of link prediction using a Graph Convolution Network Auto Encoder (GCN-AE). We demonstrate the utility of ERLKG with respect to COVID-19 through multiple qualitative evaluations. Due to the lack of a gold standard, we propose a relatively large intrinsic evaluation dataset for COVID-19 and use it for validating the top two performing KG embedding techniques. We find TransD to be the best performing KG embedding technique with Pearson and Spearman correlation scores of 0.4348 and 0.4570 respectively. We demonstrate that a considerable number of ERLKG’s top protein, chemical and disease predictions are currently in consideration for COVID-19 related research.
Highlights
COVID-19 is a global epidemic with a considerable fatality rate and a high transmission rate, affecting millions of people world-wide since its outbreak.1The search for treatments and possible cures for the novel Coronavirus (Wang et al, 2020b) has led to an exponential increase in scientific publications, but the challenge lies in effectively processing, integrating and leveraging related sources of information.https://www.who.int/docs/defaultsource/coronaviruse/situation-reports/20200811-covid19-sitrep-204.pdf?sfvrsn=1f4383dd 2Rapid and effective utilization of literature during times of pandemic such as COVID-19 is of utmost importance in combating the disease
We introduce a fully automated generic pipeline consisting of an Information Extraction (IE) system followed by Knowledge Graph construction
Such entities are well explored in existing literature and an analysis of their relatedness to COVID-19 is provided by leveraging the CORD-19 Open Research
Summary
Rapid and effective utilization of literature during times of pandemic such as COVID-19 is of utmost importance in combating the disease. We introduce a fully automated generic pipeline consisting of an Information Extraction (IE) system followed by Knowledge Graph construction. The IE module uses SciBERT (Beltagy et al, 2019) for performing Named Entity Recognition (NER) and Relationship Extraction (RE). The entire entity extraction procedure is fully automated and no human expertise is used. We focus on the task of association analysis of essential biomedical entities, namely, proteins, diseases and, chemicals. Such entities are well explored in existing literature and an analysis of their relatedness to COVID-19 is provided by leveraging the CORD-19 Open Research
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.