Abstract
Changes in cis-regulatory element composition that result in novel patterns of gene expression are thought to be a major contributor to the evolution of lineage-specific traits. Although transcription factor binding events show substantial variation across species, most computational approaches to study regulatory elements focus primarily upon highly conserved sites, and rely heavily upon multiple sequence alignments. However, sequence conservation based approaches have limited ability to detect lineage-specific elements that could contribute to species-specific traits. In this paper, we describe a novel framework that utilizes a birth-death model to trace the evolution of lineage-specific binding sites without relying on detailed base-by-base cross-species alignments. Our model was applied to analyze the evolution of binding sites based on the ChIP-seq data for six transcription factors (GATA1, SOX2, CTCF, MYC, MAX, ETS1) along the lineage toward human after human-mouse common ancestor. We estimate that a substantial fraction of binding sites (∼58–79% for each factor) in humans have origins since the divergence with mouse. Over 15% of all binding sites are unique to hominids. Such elements are often enriched near genes associated with specific pathways, and harbor more common SNPs than older binding sites in the human genome. These results support the ability of our method to identify lineage-specific regulatory elements and help understand their roles in shaping variation in gene regulation across species.
Highlights
Changes in gene regulation play a key role in the evolution of morphological traits [1,2,3]
Cross-species comparisons of non-coding sequences are limited in their ability to study regulatory sequence evolution, in cases where the elements are selected for novelty or newly-derived
Recent experimental studies showed that the evolution of transcription factor binding sites (TFBS) is highly dynamic, with sites differing a great deal even between closely related mammalian species
Summary
Changes in gene regulation play a key role in the evolution of morphological traits [1,2,3]. Previous computational studies have inferred the evolution of regulatory elements using, for example, the emergence of new conserved elements specific to a particular clade in the phylogeny [16] or lineage-specific alterations leading to a loss-of-function phenotype [17,18]. Such approaches have been helpful in understanding lineage-specific regulatory element evolution, all inherently rely upon fixed cross-species alignments, which are frequently of low quality within non-coding regions in the genome [19,20,21]. Systematic identification of binding sites for specific TFs and assessment of their conservation and prevalence using cross-species comparisons remains a challenging problem
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.