Abstract

BackgroundThe recent Coronavirus Disease 2019 (COVID-19) pandemic has placed severe stress on healthcare systems worldwide, which is amplified by the critical shortage of COVID-19 tests.MethodsIn this study, we propose to generate a more accurate diagnosis model of COVID-19 based on patient symptoms and routine test results by applying machine learning to reanalyzing COVID-19 data from 151 published studies. We aim to investigate correlations between clinical variables, cluster COVID-19 patients into subtypes, and generate a computational classification model for discriminating between COVID-19 patients and influenza patients based on clinical variables alone.ResultsWe discovered several novel associations between clinical variables, including correlations between being male and having higher levels of serum lymphocytes and neutrophils. We found that COVID-19 patients could be clustered into subtypes based on serum levels of immune cells, gender, and reported symptoms. Finally, we trained an XGBoost model to achieve a sensitivity of 92.5% and a specificity of 97.9% in discriminating COVID-19 patients from influenza patients.ConclusionsWe demonstrated that computational methods trained on large clinical datasets could yield ever more accurate COVID-19 diagnostic models to mitigate the impact of lack of testing. We also presented previously unknown COVID-19 clinical variable correlations and clinical subgroups.

Highlights

  • The recent Coronavirus Disease 2019 (COVID-19) pandemic has placed severe stress on healthcare systems worldwide, which is amplified by the critical shortage of COVID-19 tests

  • Compilation of patient data and summary of clinical variables After compiling information from 151 published studies, we present a total of 42 different clinical variables, including 21 categorical and 21 continuous variables, that are reported in more than 1 study

  • Continuous variables include age, which has a mean of 38.91 years and variance 21.86 years, and serum neutrophil levels, which has a mean of 6.85 × 109 cells/L and a variance of 12.63 × 109 cells/L

Read more

Summary

Introduction

The recent Coronavirus Disease 2019 (COVID-19) pandemic has placed severe stress on healthcare systems worldwide, which is amplified by the critical shortage of COVID-19 tests. The majority of primary studies published on COVID-19 suffered from small sample sizes [1, 2]. There is an urgent need to collate all available published data on the clinical characteristics of COVID-19 from different studies to construct a comprehensive dataset for gaining insights into the pathogenesis and clinical characteristics of COVID-19. We aim to perform a large-scale meta-analysis to synthesize all published studies with COVID-19 patient clinical data, with the goal of uncovering novel correlations between clinical variables in COVID-19 patients. We will apply machine learning to reanalyze the data and construct a computational model for predicting whether someone has COVID-19 based on their clinical information alone

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.