The Impact of the Nucleosome Code on Protein-Coding Sequence Evolution in Yeast

Tobias Warnecke,Nizar N Batada,Laurence D Hurst,Dmitri A Petrov

doi:10.1371/journal.pgen.1000250

Tobias Warnecke, Nizar N Batada + Show 2 more

Open Access

https://doi.org/10.1371/journal.pgen.1000250

Copy DOI

Journal: PLoS Genetics	Publication Date: Nov 7, 2008
Citations: 145	License type: CC BY 4.0

Affiliation: University of Bath

Abstract

Coding sequence evolution was once thought to be the result of selection on optimal protein function alone. Selection can, however, also act at the RNA level, for example, to facilitate rapid translation or ensure correct splicing. Here, we ask whether the way DNA works also imposes constraints on coding sequence evolution. We identify nucleosome positioning as a likely candidate to set up such a DNA-level selective regime and use high-resolution microarray data in yeast to compare the evolution of coding sequence bound to or free from nucleosomes. Controlling for gene expression and intra-gene location, we find a nucleosome-free “linker” sequence to evolve on average 5–6% slower at synonymous sites. A reduced rate of evolution in linker is especially evident at the 5′ end of genes, where the effect extends to non-synonymous substitution rates. This is consistent with regular nucleosome architecture in this region being important in the context of gene expression control. As predicted, codons likely to generate a sequence unfavourable to nucleosome formation are enriched in linker sequence. Amino acid content is likewise skewed as a function of nucleosome occupancy. We conclude that selection operating on DNA to maintain correct positioning of nucleosomes impacts codon choice, amino acid choice, and synonymous and non-synonymous rates of evolution in coding sequence. The results support the exclusion model for nucleosome positioning and provide an alternative interpretation for runs of rare codons. As the intimate association of histones and DNA is a universal characteristic of genic sequence in eukaryotes, selection on coding sequence composition imposed by nucleosome positioning should be phylogenetically widespread.

Highlights

In simple models of molecular evolution, selection on protein coding sequence (CDS) is exclusively devoted to optimizating protein function
Why do some parts of genes evolve slower than others? How can we account for the amino acid make-up of different parts of a protein? Answers to these questions are usually framed by reference to what the protein does and how it does it
Looking at genes in baker’s yeast, we find that sequence between nucleosomes, linker sequence, is slow evolving

Summary

Introduction

In simple models of molecular evolution, selection on protein coding sequence (CDS) is exclusively devoted to optimizating protein function. We expect amino acid choice to be dictated by protein function alone and synonymous mutations to be neutrally evolving. Many stages of the protein production chain are subject to their own particular regimes of selective constraint. Is this the case when protein-coding information is still stored as DNA in its chromosomal context? Is this the case when protein-coding information is still stored as DNA in its chromosomal context? In other words, does the way DNA is organized come with its own important requirements on sequence composition, requirements that potentially conflict with optimization of protein function or translation rate optimization or any of the other forces?

Methods

Results

Discussion

Conclusion