Abstract

DNA-binding proteins are fundamentally important in understanding cellular processes. Thus, the identification of DNA-binding proteins has the particularly important practical application in various fields, such as drug design. We have proposed a novel approach method for predicting DNA-binding proteins using only sequence information. The prediction model developed in this study is constructed by support vector machine-sequential minimal optimization (SVM-SMO) algorithm in conjunction with a hybrid feature. The hybrid feature is incorporating evolutionary information feature, physicochemical property feature, and two novel attributes. These two attributes use DNA-binding residues and nonbinding residues in a query protein to obtain DNA-binding propensity and nonbinding propensity. The results demonstrate that our SVM-SMO model achieves 0.67 Matthew's correlation coefficient (MCC) and 89.6% overall accuracy with 88.4% sensitivity and 90.8% specificity, respectively. Performance comparisons on various features indicate that two novel attributes contribute to the performance improvement. In addition, our SVM-SMO model achieves the best performance than state-of-the-art methods on independent test dataset.

Highlights

  • DNA-protein interaction has diverse functions in the cell, and it plays an important role in a variety of biological processes, such as gene regulation, DNA replication, and repair

  • We propose a novel method for predicting DNA-binding proteins using a support vector machine-sequential minimal optimization (SVM-Sequential minimal optimization (SMO)) algorithm in conjunction with a hybrid feature

  • In order to know the importance of those two features, binding propensity (BP) and NB were combined with evolutionary information feature (EI) and physicochemical property feature (PP) to construct DNA-binding proteins prediction model using SVM-SMO algorithm, respectively

Read more

Summary

Introduction

DNA-protein interaction has diverse functions in the cell, and it plays an important role in a variety of biological processes, such as gene regulation, DNA replication, and repair. It is important to develop computational methods for identifying DNAbinding proteins directly from amino acid sequence instead of structure information. Cai and Lin developed support vector machine (SVM) and the pseudoamino acid composition, a collection of nonlinear features extractable from protein sequence, to construct DNA-binding proteins prediction [10]. A web-server DNAbinder (http://www.imtech.res.in/raghava/dnabinder/) has been developed for identifying DNA-binding proteins and domains from query amino acid sequences. It was constructed by SVM using amino acid composition and PSSM profiles [12]. We propose a novel method for predicting DNA-binding proteins using a support vector machine-sequential minimal optimization (SVM-SMO) algorithm in conjunction with a hybrid feature. The results demonstrate that the two novel attributes we propose in the research are discriminative to distinguish between DNAbinding proteins from nonbinding proteins

Materials and Methods
Feature Vector
Result and Discussion
Findings
Importance of Novel Attributes
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call