Abstract

Predicting DNA-binding residues from a protein three-dimensional structure is a key task of computational structural proteomics. In the present study, based on machine learning technology, we aim to explore a reduced set of weighted average features for improving prediction of DNA-binding residues on protein surfaces. Via constructing the spatial environment around a DNA-binding residue, a novel weighting factor is first proposed to quantify the distance-dependent contribution of each neighboring residue in determining the location of a binding residue. Then, a weighted average scheme is introduced to represent the surface patch of the considering residue. Finally, the classifier is trained on the reduced set of these weighted average features, consisting of evolutionary profile, interface propensity, betweenness centrality and solvent surface area of side chain. Experimental results on 5-fold cross validation and independent tests indicate that the new feature set are effective to describe DNA-binding residues and our approach has significantly better performance than two previous methods. Furthermore, a brief case study suggests that the weighted average features are powerful for identifying DNA-binding residues and are promising for further study of protein structure-function relationship. The source code and datasets are available upon request.

Highlights

  • Protein-DNA interactions play a central role in various biological processes such as gene regulation and transcription [1]

  • To overcome three limitations of previous machine learningbased methods for DNA-binding sites identification, in our experiments, we firstly validated the solutions for three limitations respectively: the RW-position specific scoring matrix (PSSM) profile compares favorably to the conventional concatenated PSSM profile (C-PSSM) profile; Several topological and structural features are proved again to have satisfactory ability to describe DNA-binding residues on proteins, especially for the betweenness centrality; And for those highly predictive features, we have carefully rank their importance and combination on the improvement of DNA-binding sites prediction

  • Our study indicates that the betweenness centrality, one of the global topological central measures, can be used to discriminate DNA-binding residues from the remaining surface

Read more

Summary

Introduction

Protein-DNA interactions play a central role in various biological processes such as gene regulation and transcription [1]. Due to the success of structural genomics initiatives, an increasing proportion of solved protein structures are functionally unannotated [2]; understanding the relationship between protein structure and function and extrapolating the binding mechanism remains a challenging task. Identification of DNA-binding residues in newly solved protein structures is highly desirable in structural proteomics, which can advance our understanding of the binding mechanism and will be useful in functional annotation and site-directed mutagenesis. Another potential application of DNA-binding residue prediction is in protein-DNA docking, which can be further used to generate models of protein-DNA complexes and study the effects of mutations or different operator sequences on complex formation [3,4]

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call