IDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model

Wei-Zhong Lin,Xuan Xiao,Kuo-Chen Chou,Jian-An Fang

doi:10.1371/journal.pone.0024756

Abstract

DNA-binding proteins play crucial roles in various cellular processes. Developing high throughput tools for rapidly and effectively identifying DNA-binding proteins is one of the major challenges in the field of genome annotation. Although many efforts have been made in this regard, further effort is needed to enhance the prediction power.By incorporating the features into the general form of pseudo amino acid composition that were extracted from protein sequences via the “grey model” and by adopting the random forest operation engine, we proposed a new predictor, called iDNA-Prot, for identifying uncharacterized proteins as DNA-binding proteins or non-DNA binding proteins based on their amino acid sequences information alone. The overall success rate by iDNA-Prot was 83.96% that was obtained via jackknife tests on a newly constructed stringent benchmark dataset in which none of the proteins included has pairwise sequence identity to any other in a same subset. In addition to achieving high success rate, the computational time for iDNA-Prot is remarkably shorter in comparison with the relevant existing predictors. Hence it is anticipated that iDNA-Prot may become a useful high throughput tool for large-scale analysis of DNA-binding proteins.As a user-friendly web-server, iDNA-Prot is freely accessible to the public at the web-site on http://icpr.jci.edu.cn/bioinfo/iDNA-Prot or http://www.jci-bioinfo.cn/iDNA-Prot. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results.

Highlights

DNA-binding proteins play a vitally important role in many biological processes, such as recognition of specific nucleotide sequences, regulation of transcription, and regulation of gene expression
Facing the avalanche of new protein sequences generated in the postgenomic age, it is highly desired to develop automated methods for rapidly and effectively identifying and characterizing DNA-binding proteins based on the protein sequence information alone
According to a recent comprehensive review [28], to establish a really useful statistical predictor for a protein system, we need to consider the following procedures: (i) construct or select a valid benchmark dataset to train and test the predictor; (ii) formulate the protein samples with an effective mathematical expression that can truly reflect their intrinsic correlation with the attribute to be predicted; (iii) introduce or develop a powerful algorithm to operate the prediction; (iv) properly perform crossvalidation tests to objectively evaluate the anticipated accuracy of the predictor; (v) establish a user-friendly web-server for the predictor that is accessible to the public

Summary

Introduction

DNA-binding proteins play a vitally important role in many biological processes, such as recognition of specific nucleotide sequences, regulation of transcription, and regulation of gene expression. According to a recent comprehensive review [28], to establish a really useful statistical predictor for a protein system, we need to consider the following procedures: (i) construct or select a valid benchmark dataset to train and test the predictor; (ii) formulate the protein samples with an effective mathematical expression that can truly reflect their intrinsic correlation with the attribute to be predicted; (iii) introduce or develop a powerful algorithm (or engine) to operate the prediction; (iv) properly perform crossvalidation tests to objectively evaluate the anticipated accuracy of the predictor; (v) establish a user-friendly web-server for the predictor that is accessible to the public.

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS ONE	Publication Date: Sep 15, 2011
Citations: 344	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

IDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

DNA-Prot: Identification of DNA Binding Proteins from Protein Sequence Information using Random Forest
K Krishna Kumar ... P N Suganthan
Journal of Biomolecular Structure and Dynamics | VOL. 26
K Krishna Kumar, et. al.K Krishna Kumar ... P N Suganthan
01 Jun 2009
Journal of Biomolecular Structure and Dynamics | VOL. 26

Identification of DNA binding proteins using evolutionary profiles position specific scoring matrix
Muhammad Waris ... Maqsood Hayat
Neurocomputing | VOL. 199
Muhammad Waris, et. al.Muhammad Waris ... Maqsood Hayat
06 Apr 2016
Neurocomputing | VOL. 199

Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences
Wei Wang ... Jinling Shi
BMC Bioinformatics | VOL. 18
Wei Wang, et. al.Wei Wang ... Jinling Shi
12 Jun 2017
BMC Bioinformatics | VOL. 18

Identification of DNA-Binding Proteins via Hypergraph Based Laplacian Support Vector Machine
Yuqing Qian ... Hao Meng
Current Bioinformatics | VOL. 17
Yuqing Qian, et. al.Yuqing Qian ... Hao Meng
01 Jan 2021
Current Bioinformatics | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

IDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE