Abstract Background It is estimated that 10% of patients that suffered from severe acute SARS-CoV-2 infection experience symptoms beyond 3 months post-disease onset. Long COVID is a multisystemic condition that comprises more than 200 symptoms. Common new-onset medical conditions include cardiovascular, type-2 diabetes and chronic fatigue syndrome. Whilst progress has been made in characterising the mechanisms underlying long COVID, supported by similarities with other viral infections, available diagnostics are still insufficient. Molecular phenotyping is an established systems medicine tool that provides an integrative profile of an individual’s biological status that results from the cooperative genomic, transcriptomic and proteomic response to environmental stimuli. It exploits spectroscopic platforms based on nuclear magnetic resonance spectroscopy (NMR) and mass spectrometry (MS) to generate precise information on a plethora of molecules that can be modelled to explain specific physiological and pathological conditions. Methods A well characterised longitudinal cohort with >200 SARS-CoV-2 positive individuals was collected in Cambridge Hospitals (2020, Wuhan strain). Each individual was classified according to the severity of its respiratory symptoms ranging from A (asymptomatic) to E (external ventilation). For each sample, cellular and immunological parameters, NMR lipoproteins, amino acid, tryptophan and lipidomics assays were measured. Functional PCA models captured latent dynamics in individual trajectories for C-reactive protein, an established marker of inflammation, and metabolic parameters found to be associated with it. Based on these trends, patient heterogeneity was explored using Gaussian mixture modelling and the resulting stratification was used to train personalised predictors of disease outcome using the first time point after onset as input. The addition of a penalty function during training ensured a parsimonious selection of parameters. Patients were asked to complete surveys months (2–6) after infection. Abundance and co-occurrence of symptoms was assessed using latent factor analysis. Results Linear mixed model regression revealed strong associations between C-reactive protein abundance and molecular parameters. Quinolinic acid, VLDL triglycerides and phospholipids, VLDL and IDL cholesterol and glycoprotein related GlycA are positively correlated, while tryptophan, taurine, indole-3-acetic acid, HDL cholesterol, and phospholipids and the supramolecular phospholipid composite (SPC) peak are strongly anti-correlated. Those findings are congruent with prior work on cohorts from Western Australia and Spain. In both cases, strong lipoprotein signatures were observed; GlycA is increased during acute phase, while SPC and HDL subfraction 4 are strongly depleted. The analysis of individual trajectories measured by CRP and associated parameters revealed 3 groups, mildly affected, good recovery and poor prospect. Patients from the last category showed profoundly altered metabolic profiles even weeks after onset, and a higher co-occurrence of neurological related symptoms. Furthermore, disease outcome scores were predicted for each patient in the early stage of infection. The training of these predictors also distilled a panel of most relevant parameters for that purpose. NMR lipoproteins featured excellent predictive capabilities. Conclusion Broad phenotyping, combined with multi-view multivariate analysis, allowed for robust stratification of COVID-19 patients and accurate personalised prediction of disease outcome. It also confirms the critical role played by lipoprotein metabolism in the immune response that is successfully captured by NMR-base lipoprotein parameters.
Read full abstract