Abstract

Protein-DNA interactions play a critical role in many biological processes. However, the structural mechanisms underlying these interactions are not fully understood. DNA binding proteins can be classified into double-stranded DNA binding proteins (DSBs) and single-stranded DNA binding proteins (SSBs). Understanding the binding specificity of a DNA binding protein is helpful for the research of protein functions. Though there are some researches [1] on the SSB and DSB respectively, few attentions have been paid on investigating what makes SSB and DSB have such different ability of the specific binding. With the development of biotechnology, a large amount of proteins has been sequenced. However, SSBs have shown to have little sequence conservation [2]. Even DSBs involved in similar functions may have conserved subsequences, different kinds of DSBs with different functions seems to show few common subsequences. Therefore, it is hard to recognize SSB sequences from DSB sequences, or vice versa. In fact, up to Jan. 25, 2013, the Protein Data Bank (PDB) [3] contains 3391 structures for DNA binding proteins, among them only about 30% and 5% are annotated as DSBs and SSBs, respectively, and whether the remainders belong to DSBs or SSBs are still not very clear. Therefore, a computational method is required to annotate the DNA binding protein as DSB or SSB automatically. The surface of a protein is generally irregular, containing many clefts and grooves of varying shapes and sizes. Previous researches have shown that a large cleft can provide an increased opportunity for the protein to form interactions with other molecules, particularly small ligands [4]. Therefore, some researches used a particularly large and deep cleft to characterize the binding active sites of the proteins [5]. We guess that for DNA binding proteins, the cleft properties on the surface may also play important roles on the dsDNA/ssDNA binding specificity. In this work, we applied CAVER 3.0 package [6] to detect the clefts and the corresponding indexes of the largest clefts on the protein surfaces, to investigate whether they are possible to be used for distinguishing the potential interfaces between SSBs and DSBs. Concretely, we mainly got three indexes of the detected tunnels: length, curvature and bottleneck radius. Research results have shown that although the sequences of different SSBs are very different, there are well-conserved elements in the structures. That is, most SSBs contain one or more OB (oligonucleotide/oligosaccharide binding) - fold domains [2]. A typical OB-fold has a five-stranded beta-sheet coiled to form a closed beta-barrel. This barrel is capped by an alpha-helix located between the third and fourth strands. The OB-fold plays critical role in binding with ssDNA. Although it is hard to say that the OB-fold is unique for SSBs, we think that it should also be used as an important descriptor to distinguish SSBs from DSBs. Therefore, we use the protein structure alignment package TM-align [7] to compare its structure with each of the six OB-fold protein templates and use the maximal alignment score TM-score as the OB-fold feature of the protein. We aim to investigate the structural differences between collected SSBs and DSBs, and extract the structure-based features related to surface clefts and OB-folds, Based on which, we construct a computational model that can automatically classify the DNA protein as a DSB or SSB by using the widely used support vector machine (SVM), with prediction accuracy of HOLO-set 0.87, APO-set 0.83, and mixed-set 0.83, respectively. The promising performance suggests that our method will be useful in the protein function annotation and refinement. This work is supported by grant from the National Science Foundation of China (61272274); Program for New Century Excellent Talents in Universities (NCET-10-0644), the Open Research Fund of State Key Laboratory of Hybrid Ri ce (Wuhan University) (KF201301) and the Fundamental Research Funds for the Central Universities (No. 2012211020204).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call