Abstract

Prediction of DNA-binding proteins (DBPs) has become a popular research topic in protein science due to its crucial role in all aspects of biological activities. Even though considerable efforts have been devoted to developing powerful computational methods to solve this problem, it is still a challenging task in the field of bioinformatics. A hidden Markov model (HMM) profile has been proved to provide important clues for improving the prediction performance of DBPs. In this paper, we propose a method, called HMMPred, which extracts the features of amino acid composition and auto- and cross-covariance transformation from the HMM profiles, to help train a machine learning model for identification of DBPs. Then, a feature selection technique is performed based on the extreme gradient boosting (XGBoost) algorithm. Finally, the selected optimal features are fed into a support vector machine (SVM) classifier to predict DBPs. The experimental results tested on two benchmark datasets show that the proposed method is superior to most of the existing methods and could serve as an alternative tool to identify DBPs.

Highlights

  • DNA-binding proteins (DBPs), which can bind to and interact with DNA, play prominent roles in the structural composition of DNA and the regulation of genes

  • We propose a novel method, called HMMPred, which utilizes features extracted solely from the hidden Markov model (HMM) profile to further improve the prediction accuracy of DBPs

  • Features are extracted from the HMM profiles by fusing three techniques, i.e., amino acid composition (AAC), UniProt database

Read more

Summary

Introduction

DNA-binding proteins (DBPs), which can bind to and interact with DNA, play prominent roles in the structural composition of DNA and the regulation of genes. These proteins have a variety of biochemical functions in the cell and molecular biology, including the participation and regulation of various cellular processes, such as transcription, DNA replication, recombination, modification, and repair [1, 2]. DBPs were normally identified by experimental techniques, such as filter binding assays, genetic analysis, Xray crystallography, ChIP-chip analysis, and nuclear magnetic resonance (NMR) [4]. With the rapid increase of protein sequence data, there is a great need to develop efficient computational methods to identify DBPs solely based on their primary sequences

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call