De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins

Anton V Persikov,Mona Singh

doi:10.1093/nar/gkt890

Abstract

Proteins with sequence-specific DNA binding function are important for a wide range of biological activities. De novo prediction of their DNA-binding specificities from sequence alone would be a great aid in inferring cellular networks. Here we introduce a method for predicting DNA-binding specificities for Cys2His2 zinc fingers (C2H2-ZFs), the largest family of DNA-binding proteins in metazoans. We develop a general approach, based on empirical calculations of pairwise amino acid–nucleotide interaction energies, for predicting position weight matrices (PWMs) representing DNA-binding specificities for C2H2-ZF proteins. We predict DNA-binding specificities on a per-finger basis and merge predictions for C2H2-ZF domains that are arrayed within sequences. We test our approach on a diverse set of natural C2H2-ZF proteins with known binding specificities and demonstrate that for >85% of the proteins, their predicted PWMs are accurate in 50% of their nucleotide positions. For proteins with several zinc finger isoforms, we show via case studies that this level of accuracy enables us to match isoforms with their known DNA-binding specificities. A web server for predicting a PWM given a protein containing C2H2-ZF domains is available online at http://zf.princeton.edu and can be used to aid in protein engineering applications and in genome-wide searches for transcription factor targets.

Highlights

The ability of proteins to recognize and bind specific DNA regions is critical in a range of key biological processes, including transcription, replication, packaging, repair and recombination
We have previously shown that inferring these contact energies via support vector machines (SVMs) yields accurate predictions of whether a Cys2His2 zinc finger (C2H2-ZF) protein can bind a specific DNA site and outperforms previously described approaches [12]
Our combined test set contains $1400 columns in their position weight matrices (PWMs), and we find that $55% of the columns in our data set have information content (IC)-weighted Pearson correlation coefficient (PCC) scores greater than or equal to 0.25 using either the canonical, expanded or polynomial SVMs

Summary

Introduction

The ability of proteins to recognize and bind specific DNA regions is critical in a range of key biological processes, including transcription, replication, packaging, repair and recombination. Sequence-specific DNA recognition by transcription factors is of particular interest due to its role in dictating when and where proteins are expressed. C2H2-ZF proteins have been intensely studied, with thousands of experimentally determined examples of protein–DNA pairs, largely based on the Zif268 model system, that are known to either bind or not. The binding specificities of most C2H2-ZFs within genomes are not known: for example, in the human genome, of the $675 proteins annotated with C2H2-ZF domains [7], specificities have been determined for less than a hundred [8]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Nucleic Acids Research	Publication Date: Oct 3, 2013
Citations: 192	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Nucleic Acids Research

Lead the way for us

Similar Papers

A Synthetic Biology Framework for Programming Eukaryotic Transcription Functions
Ahmad S Khalil ... James J Collins
Cell | VOL. 150
Ahmad S Khalil, et. al.Ahmad S Khalil ... James J Collins
01 Aug 2012
Cell | VOL. 150

Targeting DNA With Fingers and TALENs.
Daniel F Carlson ... Perry B Hackett
Molecular Therapy - Nucleic Acids | VOL. 1
Daniel F Carlson, et. al.Daniel F Carlson ... Perry B Hackett
01 Jan 2012
Molecular Therapy - Nucleic Acids | VOL. 1

A Single Amino Acid Substitution in Zinc Finger 2 of Adr1p Changes its Binding Specificity at two Positions in UAS1
Cheng Cheng ... Elton T Young
Journal of Molecular Biology | VOL. 251
Cheng Cheng, et. al.Cheng Cheng ... Elton T Young
01 Aug 1995
Journal of Molecular Biology | VOL. 251

Variation in Homeodomain DNA Binding Revealed by High-Resolution Analysis of Sequence Preferences
Michael F Berger ... Timothy R Hughes
Cell | VOL. 133
Michael F Berger, et. al.Michael F Berger ... Timothy R Hughes
01 Jun 2008
Cell | VOL. 133

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Nucleic Acids Research