Abstract

DNA-binding proteins play a very important role in the structural composition of the DNA. In addition, they regulate and effect various cellular processes like transcription, DNA replication, DNA recombination, repair and modification. The experimental methods used to identify DNA-binding proteins are expensive and time consuming and thus attracted researchers from computational field to address the problem. In this paper, we present iDNAProt-ES, a DNA-binding protein prediction method that utilizes both sequence based evolutionary and structure based features of proteins to identify their DNA-binding functionality. We used recursive feature elimination to extract an optimal set of features and train them using Support Vector Machine (SVM) with linear kernel to select the final model. Our proposed method significantly outperforms the existing state-of-the-art predictors on standard benchmark dataset. The accuracy of the predictor is 90.18% using jack knife test and 88.87% using 10-fold cross validation on the benchmark dataset. The accuracy of the predictor on the independent dataset is 80.64% which is also significantly better than the state-of-the-art methods. iDNAProt-ES is a novel prediction method that uses evolutionary and structural based features. We believe the superior performance of iDNAProt-ES will motivate the researchers to use this method to identify DNA-binding proteins. iDNAProt-ES is publicly available as a web server at: http://brl.uiu.ac.bd/iDNAProt-ES/.

Highlights

  • Computational methods that have been used to predict the DNA-binding proteins can be broadly categorized into two groups: structure based methods[11,12] and sequence based methods[13,14,15,16,17,18,19]

  • We compare the results achieved by iDNAProt-ES with previous state-of-the-art methods found in the literature including: DNABinder[28], DNA-Prot[25], iDNA-Prot[26], iDNA-Prot|dis[13], DBPPred[15], iDNAPro-PseAAC14, PseDNA-Pro[29], Kmer1 + ACC30 and Local-DPP16

  • We present iDNAProt-ES, a novel prediction method for identification of DNA-binding proteins

Read more

Summary

Introduction

Computational methods that have been used to predict the DNA-binding proteins can be broadly categorized into two groups: structure based methods[11,12] and sequence based methods[13,14,15,16,17,18,19]. DNA-Prot is another software proposed in[25] They used amino acid composition, physio-chemical properties and secondary structure information as features and trained their model using a Random Forest classifier. Amino acid distance-pair coupling information and the amino acid reduced alphabet profile was incorporated into the general form of pseudo amino acid composition[31] by Liu et al.[13] They offered a freely available web-server called iDNA-Prot|dis. They used a wrapper based best first feature selection technique to select optimal set of features They used features based on amino acid composition, PSSM scores, secondary structures and relative solvent accessibility and trained their model using Random Forest and Gaussian Naive Bayesian classifiers. They used profile-based protein representation and selected a set of 23 optimal features using Linear Discriminant Analysis (LDA) Their model was trained using Support Vector Machine (SVM) classifier. Among other recent works are SVM-PSSM-DT32, PNImodeler[33], CNNsite[34], BindUP35, etc

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call