Abstract
Transcriptional regulation critically depends on proper interactions between transcription factors (TF) and their cognate DNA binding sites. The widely used model of TF-DNA binding – the Positional Weight Matrix (PWM) – presumes independence between positions within the binding site. However, there is evidence to show that the independence assumption may not always hold, and the extent of interposition dependence is not completely known. We hypothesize that the interposition dependence should partly be manifested as correlated evolution at the positions. We report a Maximum-Likelihood (ML) approach to infer correlated evolution at any two positions within a PWM, based on a multiple alignment of 5 mammalian genomes. Application to a genome-wide set of putative cis elements in human promoters reveals a prevalence of correlated evolution within cis elements. We found that the interdependence between two positions decreases with increasing distance between the positions. The interdependent positions tend to be evolutionarily more constrained and moreover, the dependence patterns are relatively similar across structurally related transcription factors. Although some of the detected mutational dependencies may be due to context-dependent genomic hyper-mutation, notably CG to TG, the majority is likely due to context-dependent preferences for specific nucleotide combinations within the cis elements. Patterns of evolution at individual nucleotide positions within mammalian TF binding sites are often significantly correlated, suggesting interposition dependence. The proposed methodology is also applicable to other classes of non-coding functional elements. A detailed investigation of mutational dependencies within specific motifs could reveal preferred nucleotide combinations that may help refine the DNA binding models.
Highlights
Eukaryotic gene transcription is tightly regulated, in large part, by transcription factor proteins (TF) that bind to DNA, often in a sequence-specific fashion [1,2]
Our analysis is based on a genome-wide set of putative TF binding sites in human proximal promoters based on 64 vertebrate Positional Weight Matrix (PWM) in JASPAR [22]
We only consider binding sites contained within gapless regions in the multiple alignment of 5 species – Human, Chimpanzee, Mouse, Rat, and Dog, obtained from the UCSC database [23]
Summary
Eukaryotic gene transcription is tightly regulated, in large part, by transcription factor proteins (TF) that bind to DNA, often in a sequence-specific fashion [1,2]. A PWM is a 4-by-n matrix where the rows correspond to the 4 bases, and the columns correspond to n positions in the binding site. The PWM is currently used as the de facto model of TF-DNA interaction, a major shortcoming of this model is the assumption that the nucleotide preferences at individual positions within the binding site are independent of each other. There are both direct experimental evidence [5,6], as well as indirect evidence based on computational modeling [7,8], that suggest that the interposition independence assumption does not hold universally. Our focus here is on detecting the specific instance of inter-positional dependence and not on the extent to which these dependencies affect the overall accuracy of binding site prediction
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.