Abstract

Interactions between proteins and DNAs play essential roles in many biological processes. DNA binding proteins can be classified into two categories. Double-stranded DNA-binding proteins (DSBs) bind to double-stranded DNA and are involved in a series of cell functions such as gene expression and regulation. Single-stranded DNA-binding proteins (SSBs) are necessary for DNA replication, recombination, and repair and are responsible for binding to the single-stranded DNA. Therefore, the effective classification of DNA-binding proteins is helpful for functional annotations of proteins. In this work, we propose PredPSD, a computational method based on sequence information that accurately predicts SSBs and DSBs. It introduces three novel feature extraction algorithms. In particular, we use the autocross-covariance (ACC) transformation to transform feature matrices into fixed-length vectors. Then, we put the optimal feature subset obtained by the minimal-redundancy-maximal-relevance criterion (mRMR) feature selection algorithm into the gradient tree boosting (GTB). In 10-fold cross-validation based on a benchmark dataset, PredPSD achieves promising performances with an AUC score of 0.956 and an accuracy of 0.912, which are better than those of existing methods. Moreover, our method has significantly improved the prediction accuracy in independent testing. The experimental results show that PredPSD can significantly recognize the binding specificity and differentiate DSBs and SSBs.

Highlights

  • Protein–DNA interaction is a crucial prerequisite for cell function, such as gene replication, transcription, and protein expression translation [1,2,3,4]

  • We calculated a total of 1510 sequence features for each protein, including local structural (LSE), NetSurfP, DisEMBL, overall amino acid composition (OAAC), dipeptide composition, position-specific scoring matrix (PSSM)

  • We have proposed a novel method PredPSD for the classification prediction of single-stranded DNA-binding proteins (SSBs) and DSBs

Read more

Summary

Introduction

Protein–DNA interaction is a crucial prerequisite for cell function, such as gene replication, transcription, and protein expression translation [1,2,3,4]. Double-stranded DNA-binding proteins (DSBs) bind with dsDNA, while single-stranded DNA-binding proteins (SSBs) bind with ssDNA [5,6]. The availability of binding specificity encourages researchers to focus on analyzing the specific binding sites of DSBs [11,12,13,14,15], the classification prediction of DNA-binding proteins [16,17,18], the function prediction of DNA-binding proteins [19,20,21,22] and the specificity of a protein to DNA binding [23,24], etc. The few existing methods for large-scale identification of DSBs and SSBs need further improvement.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call