Abstract

Playing crucial roles in various cellular processes, such as recognition of specific nucleotide sequences, regulation of transcription, and regulation of gene expression, DNA-binding proteins are essential ingredients for both eukaryotic and prokaryotic proteomes. With the avalanche of protein sequences generated in the postgenomic age, it is a critical challenge to develop automated methods for accurate and rapidly identifying DNA-binding proteins based on their sequence information alone. Here, a novel predictor, called “iDNA-Prot|dis”, was established by incorporating the amino acid distance-pair coupling information and the amino acid reduced alphabet profile into the general pseudo amino acid composition (PseAAC) vector. The former can capture the characteristics of DNA-binding proteins so as to enhance its prediction quality, while the latter can reduce the dimension of PseAAC vector so as to speed up its prediction process. It was observed by the rigorous jackknife and independent dataset tests that the new predictor outperformed the existing predictors for the same purpose. As a user-friendly web-server, iDNA-Prot|dis is accessible to the public at http://bioinformatics.hitsz.edu.cn/iDNA-Prot_dis/. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step protocol guide is provided on how to use the web-server to get their desired results without the need to follow the complicated mathematic equations that are presented in this paper just for the integrity of its developing process. It is anticipated that the iDNA-Prot|dis predictor may become a useful high throughput tool for large-scale analysis of DNA-binding proteins, or at the very least, play a complementary role to the existing predictors in this regard.

Highlights

  • DNA-binding proteins are essential ingredients for both eukaryotic and prokaryotic proteomes

  • The 1st type is using both the structural of proteins and their sequences information for identifying the DNAbinding proteins. These methods did play an important role in stimulating the development of this area, the structural information of proteins is not always available, for the huge amount of uncharacterized protein sequences generated in the post genomic age

  • The overall Acc values with different d obtained are shown in Fig. 2, from which we can see that iDNAProt|dis achieves the best performance when d = 3

Read more

Summary

Introduction

DNA-binding proteins are essential ingredients for both eukaryotic and prokaryotic proteomes. The identification of DNA binding proteins was carried out by experimental techniques, including filter binding assays, genetic analysis, chromatin immune precipitation on microarrays, and X-ray crystallography. It is both time-consuming and expensive to identify DNA-binding proteins purely based on biochemical experiments alone. The 1st type is using both the structural of proteins and their sequences information for identifying the DNAbinding proteins (see, e.g., [2,3,4,5]) These methods did play an important role in stimulating the development of this area, the structural information of proteins is not always available, for the huge amount of uncharacterized protein sequences generated in the post genomic age. These methods did stimulat the development by extending the identification power to cover those proteins without any structural information at all, and by using various modes of pseudo amino acid composition [16] or Chou’s PseAAC [17] to take into account some sequence-order effects for enhancing the prediction quality

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call