Abstract
The large and increasing volume of genomic data analyzed by comparative methods provides information about transcription factors and their binding sites that, in turn, enables statistical analysis of correlations between factors and sites, uncovering mechanisms and evolution of specific protein-DNA recognition. Here we present an online tool, Prot-DNA-Korr, designed to identify and analyze crucial protein-DNA pairs of positions in a family of transcription factors. Correlations are identified by analysis of mutual information between columns of protein and DNA alignments. The algorithm reduces the effects of common phylogenetic history and of abundance of closely related proteins and binding sites. We apply it to five closely related subfamilies of the MerR family of bacterial transcription factors that regulate heavy metal resistance systems. We validate the approach using known 3D structures of MerR-family proteins in complexes with their cognate DNA binding sites and demonstrate that a significant fraction of correlated positions indeed form specific side-chain-to-base contacts. The joint distribution of amino acids and nucleotides hence may be used to predict changes of specificity for point mutations in transcription factors.
Highlights
Specific binding of transcription factors to DNA is a major mechanism of regulation of gene expression, boosting interest to the problem of the protein-DNA recognition code
Where fi,j(a, n) is the observed weighted frequency of a pair and fi;ejxpða; nÞ 1⁄4 fiðaÞ Â fjðnÞ is the expected weighted frequency of this pair computed as a product of fi(a), the weighted frequency of the amino acid a at the column i, and fj(n), the weighted frequency of the nucleotide n at the column j
To estimate the statistical significance of the observed mutual information values, one needs the distribution of mutual information for a random pair of columns Ii$;j
Summary
Specific binding of transcription factors to DNA is a major mechanism of regulation of gene expression, boosting interest to the problem of the protein-DNA recognition code. Initial hopes stemmed from the observations that single amino acid substitutions can drastically change the protein affinity to its DNA sites. The structure of the DNA double helix is relatively rigid. An early (mid-70s) paper suggested that specific recognition depends on hydrogen bonds between side chains of amino acid residues and nucleotides bases, PLOS ONE | DOI:10.1371/journal.pone.0162681. Correlations between Transcription Factors and Their Binding Sites Illustrated by the MerR Family An early (mid-70s) paper suggested that specific recognition depends on hydrogen bonds between side chains of amino acid residues and nucleotides bases, PLOS ONE | DOI:10.1371/journal.pone.0162681 September 30, 2016
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.