Abstract

Natural selection is a fundamental force shaping organismal evolution, as it both maintains function and enables adaptation and innovation. Viruses, with their typically short and largely coding genomes, experience strong and diverse selective forces, sometimes acting on timescales that can be directly measured. These selection pressures emerge from an antagonistic interplay between rapidly changing fitness requirements (immune and antiviral responses from hosts, transmission between hosts, or colonization of new host species) and functional imperatives (the ability to infect hosts or host cells and replicate within hosts). Indeed, computational methods to quantify these evolutionary forces using molecular sequence data were initially, dating back to the 1980s, applied to the study of viral pathogens. This preference largely emerged because the strong selective forces are easiest to detect in viruses, and, of course, viruses have clear biomedical relevance. Recent commoditization of affordable high-throughput sequencing has made it possible to generate truly massive genomic data sets, on which powerful and accurate methods can yield a very detailed depiction of when, where, and (sometimes) how viral pathogens respond to various selective forces.Here, we present recent statistical developments and state-of-the-art methods to identify and characterize these selection pressures from protein-coding sequence alignments and phylogenies. Methods described here can reveal critical information about various evolutionary regimes, including whole-gene selection, lineage-specific selection, and site-specific selection acting upon viral genomes, while accounting for confounding biological processes, such as recombination and variation in mutation rates.

Highlights

  • Natural selection is a powerful evolutionary force that shapes genomes of all living organisms

  • The synonymous evolutionary rate is used to provide a baseline rate of neutral evolution because the average selective effect of a synonymous substitution is assumed to be negligible compared to the effect of a non-synonymous substitution

  • Because we performed the original BUSTED analysis on the entire tree, we do not know from this result along which lineages KSR2 was subject to positive selection

Read more

Summary

Introduction

Natural selection is a powerful evolutionary force that shapes genomes of all living organisms. On the other extreme is the question that has an immediate biological significance: “Is changing a leucine to an arginine at position 209 in gene X along a specific branch in the phylogeny adaptive?”. Without additional information, such as a carefully experimentally measured fitness impact of introducing said substitution, current comparative sequence approaches cannot answer this question. We present a collection of statistical methods, each of which is designed to carefully address a biological question somewhere on the spectrum between the two extremes: sufficiently specific to be interesting, yet general enough to be answerable based only on the evolutionary history of homologous sequences. It is critical to model these processes, both in their own right and because ignoring their effects could bias selection inference tools and yield misleading results

Materials
How to Run a Selection Analysis
BUSTED
Annotating a collection of alignments with a binary attribute
Site-Level Selection
Screening Sequences for Recombination
Accounting for Synonymous Rate Variation
Tips 5 Exercises
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call