Abstract

BackgroundDetection of DNA-binding sites in proteins is of enormous interest for technologies targeting gene regulation and manipulation. We have previously shown that a residue and its sequence neighbor information can be used to predict DNA-binding candidates in a protein sequence. This sequence-based prediction method is applicable even if no sequence homology with a previously known DNA-binding protein is observed. Here we implement a neural network based algorithm to utilize evolutionary information of amino acid sequences in terms of their position specific scoring matrices (PSSMs) for a better prediction of DNA-binding sites.ResultsAn average of sensitivity and specificity using PSSMs is up to 8.7% better than the prediction with sequence information only. Much smaller data sets could be used to generate PSSM with minimal loss of prediction accuracy.ConclusionOne problem in using PSSM-derived prediction is obtaining lengthy and time-consuming alignments against large sequence databases. In order to speed up the process of generating PSSMs, we tried to use different reference data sets (sequence space) against which a target protein is scanned for PSI-BLAST iterations. We find that a very small set of proteins can actually be used as such a reference data without losing much of the prediction value. This makes the process of generating PSSMs very rapid and even amenable to be used at a genome level. A web server has been developed to provide these predictions of DNA-binding sites for any new protein from its amino acid sequence.AvailabilityOnline predictions based on this method are available at

Highlights

  • Detection of DNA-binding sites in proteins is of enormous interest for technologies targeting gene regulation and manipulation

  • We report that evolutionary profiles or position specific scoring matrices (PSSMs) against much smaller representative reference data sets may be utilized to achieve almost the same levels of prediction as would be obtained from alignments with large sequence data sets representing entire available sequence space

  • When a residue is conserved through cycles of PSI BLAST, it is likely to be due to a purpose i.e. biological function

Read more

Summary

Introduction

Detection of DNA-binding sites in proteins is of enormous interest for technologies targeting gene regulation and manipulation. We have previously shown that a residue and its sequence neighbor information can be used to predict DNA-binding candidates in a protein sequence. We implement a neural network based algorithm to utilize evolutionary information of amino acid sequences in terms of their position specific scoring matrices (PSSMs) for a better prediction of DNA-binding sites. We have previously developed a method of predicting DNA-binding sites of proteins from the sequence information [5]. We reported development of a neural network and corresponding web server to predict amino acid residues which are likely to bind DNA. We developed a method to identify DNA-binding proteins using electrical moments from structural information of proteins [6]. Several investigators have reported that the use of evolutionary information in sequence-based predictions of secondary structure and (page number not for citation purposes)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call