Detecting Individual Sites Subject to Episodic Diversifying Selection

Ben Murrell,Sasha Moola,Thomas Weighill,Sergei L Kosakovsky Pond,Joel O Wertheim,Konrad Scheffler,Harmit S Malik

doi:10.1371/journal.pgen.1002764

Ben Murrell, Sasha Moola + Show 5 more

Open Access

https://doi.org/10.1371/journal.pgen.1002764

Copy DOI

Abstract

The imprint of natural selection on protein coding genes is often difficult to identify because selection is frequently transient or episodic, i.e. it affects only a subset of lineages. Existing computational techniques, which are designed to identify sites subject to pervasive selection, may fail to recognize sites where selection is episodic: a large proportion of positively selected sites. We present a mixed effects model of evolution (MEME) that is capable of identifying instances of both episodic and pervasive positive selection at the level of an individual site. Using empirical and simulated data, we demonstrate the superior performance of MEME over older models under a broad range of scenarios. We find that episodic selection is widespread and conclude that the number of sites experiencing positive selection may have been vastly underestimated.

Highlights

Following the introduction of computationally tractable codonsubstitution models [1,2] nearly two decades ago, there has been sustained interest in using these models to study the past action of natural selection on protein coding genes
We analyzed simulations based on seven large (N~517{640) phylogenies downloaded from TreeBase
We have presented a mixed effects model of evolution, MEME, and a statistical test for detecting the signal of past episodic positive selection from molecular sequence data

Summary

Introduction

Following the introduction of computationally tractable codonsubstitution models [1,2] nearly two decades ago, there has been sustained interest in using these models to study the past action of natural selection on protein coding genes. Random effects codon-substitution models [10] permitted v to vary from site to site, which made it possible to identify instances when positive selection had acted only upon a small proportion of sites. Such site-level models can detect which positions in a sequence alignment may have been influenced by diversifying positive selection, e.g. It has been noted that positive selection is more readily identified in smaller alignments: counterintuitively, including additional sequences may cause sites to no longer be detected [18,19]. This phenomenon could be readily explained by purifying selection on some lineages masking the signal of positive selection on others

Methods

Results

Conclusion