Abstract

The combined impact of common and rare exonic variants in COVID-19 host genetics is currently insufficiently understood. Here, common and rare variants from whole-exome sequencing data of about 4000 SARS-CoV-2-positive individuals were used to define an interpretable machine-learning model for predicting COVID-19 severity. First, variants were converted into separate sets of Boolean features, depending on the absence or the presence of variants in each gene. An ensemble of LASSO logistic regression models was used to identify the most informative Boolean features with respect to the genetic bases of severity. The Boolean features selected by these logistic models were combined into an Integrated PolyGenic Score that offers a synthetic and interpretable index for describing the contribution of host genetics in COVID-19 severity, as demonstrated through testing in several independent cohorts. Selected features belong to ultra-rare, rare, low-frequency, and common variants, including those in linkage disequilibrium with known GWAS loci. Noteworthily, around one quarter of the selected genes are sex-specific. Pathway analysis of the selected genes associated with COVID-19 severity reflected the multi-organ nature of the disease. The proposed model might provide useful information for developing diagnostics and therapeutics, while also being able to guide bedside disease management.

Highlights

  • For almost 2 years, COVID-19 has demonstrated itself to be a disease having a broad spectrum of clinical presentations: from asymptomatic patients to those with severe symptoms leading to death or persistent disease (“long COVID”) (Livingston and Bucher 2020; Chen et al 2019; Zhang et al 2020a)

  • The aim of the present study was to develop an interpretable model that could be used to predict the severity of COVID-19 from host genetic data

  • The development of a robust predictive model able to make a direct association between single variants and disease severity grading based on an accurate analysis of the vast number of host genetic variants compared to a much smaller number of individual patients has proven to be too complex and unreliable

Read more

Summary

Introduction

For almost 2 years, COVID-19 has demonstrated itself to be a disease having a broad spectrum of clinical presentations: from asymptomatic patients to those with severe symptoms leading to death or persistent disease (“long COVID”) (Livingston and Bucher 2020; Chen et al 2019; Zhang et al 2020a). Advances in modelling the interplay between SARS-CoV-2 and host genetics hold significant promise for addressing other complex diseases. The still moderate viral genome variability has far been shown to have relatively low impact on disease severity (Islam et al 2020) where currently age, sex, and comorbidities are the major factors predicting disease susceptibility and outcome (Li et al 2021). While these factors certainly have significant value for prediction, they provide limited insights into disease pathophysiology and are of limited relevance for drug development

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.