Abstract

SummaryWe introduce VERSO, a two-step framework for the characterization of viral evolution from sequencing data of viral genomes, which is an improvement on phylogenomic approaches for consensus sequences. VERSO exploits an efficient algorithmic strategy to return robust phylogenies from clonal variant profiles, also in conditions of sampling limitations. It then leverages variant frequency patterns to characterize the intra-host genomic diversity of samples, revealing undetected infection chains and pinpointing variants likely involved in homoplasies. On simulations, VERSO outperforms state-of-the-art tools for phylogenetic inference. Notably, the application to 6,726 amplicon and RNA sequencing samples refines the estimation of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) evolution, while co-occurrence patterns of minor variants unveil undetected infection paths, which are validated with contact tracing data. Finally, the analysis of SARS-CoV-2 mutational landscape uncovers a temporal increase of overall genomic diversity and highlights variants transiting from minor to clonal state and homoplastic variants, some of which fall on the spike gene. Available at: https://github.com/BIMIB-DISCo/VERSO.

Highlights

  • A growing plethora of methods for phylogenomic reconstruction is available to this end, all relying on different algorithmic frameworks, including distance-matrix, maximum parsimony, maximum likelihood, or Bayesian inference, with various substitution models and distinct evolutionary assumptions

  • Comparative assessment on simulations In order to assess the performance of VERSO and compare it with competing approaches, we executed extensive tests on simulated datasets, generated with the coalescent model simulator msprime.[70]

  • Homoplasy detection Five clonal variants included in our model show apparent violations of the accumulation hypothesis, namely g.11083G>T

Read more

Summary

Introduction

A growing plethora of methods for phylogenomic reconstruction is available to this end, all relying on different algorithmic frameworks, including distance-matrix, maximum parsimony, maximum likelihood, or Bayesian inference, with various substitution models and distinct evolutionary assumptions (see, e.g., Refs.[10,14,15,16,17,18,19,20,21,22]) While such methods have repeatedly proven effective in unraveling the main patterns of evolution of viral genomes with respect to many different diseases, including SARS-CoV-2,10,23–25 at least two issues can be raised

Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.