Abstract

Since 2020, the SARS-CoV-2 virus has infected billions of people and spread to 185 countries. The virus spreads by making new copies of its genome inside human cells and exploits the cells’ machinery to synthesise viral proteins it needs to infect further cells. Each time the virus copies its genetic material there’s a chance that the replication process introduces an error to the genetic sequence. Over time, these mutations accumulate which can give rise to new variants with different properties. These new variants, originating from a common ancestor, may spread faster or be able to evade immune systems that have learnt to recognise previous variants. To understand where new variants of SARS-CoV-2 come from and how related they are to each other, scientists build family trees called ‘phylogenetic trees’ based on similarities in the genetic sequences of different variants of the virus. Looking at these trees researchers can track how a variant spreads geographically, and also attempt to identify new worrying variants that might lead to a new wave of infections. The scale of the COVID-19 pandemic together with the global effort by clinicians and researchers to sequence SARS-CoV-2 genetic material means a library of over 13 million SARS-CoV-2 genomes now exists, making it the largest such collection for any organism. Although phylogenetic trees of viruses have been studied for a long time, exploring the SARS-CoV-2 library presents technical and practical challenges due to its sheer size. Sanderson has developed an open-source web tool called Taxonium that allows users to explore phylogenetic trees with millions of sequences. With help from collaborators at the University of California, Santa Cruz, Sanderson built a website called Cov2Tree, that uses the Taxonium platform to allow immediate access to an expansive tree of all publicly available SARS-CoV-2 sequences. Cov2Tree enables users to visualise all SARS-CoV-2 genomes in a birds-eye view akin to a ‘Google Earth for virus sequences’ where anyone can zoom in on a related family of viruses down to the level of individual sequences. This can be used to compare variants and follow geographic spread. Using Taxonium, scientists can explore how virus sequences are related to each other. They can also see the individual mutations that have occurred at each branch of the tree, and can search for sequences based on mutation, geographical location, or other factors. Interestingly, a trend appearing in the SARS-CoV-2 phylogenetic tree is the emergence of identical mutations at different branches of the tree without a common origin. These mutations may be a result of convergent evolution, a phenomenon that occurs when a mutation appears independently in different variants as it confers an advantage to the virus making such mutations more likely to persist. This means that scientists may be able to expect certain mutations to appear in more distantly related variants if they have appeared independently in several different variants already. Overall, Taxonium is an important tool for monitoring SARS-CoV-2 genomes, but it also has broader applications. The tool can be used to browse phylogenetic trees of other viruses and organisms. Furthermore, the Taxonium website offers a way to browse a tree of life, with images and links to Wikipedia. The SARS-CoV-2 library might be the largest now, but in the future even bigger datasets will likely be available, highlighting the importance of tools like Taxonium.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.