Abstract
Prediction of DNA-binding proteins (DBPs) has become a popular research topic in protein science due to its crucial role in all aspects of biological activities. Even though considerable efforts have been devoted to developing powerful computational methods to solve this problem, it is still a challenging task in the field of bioinformatics. A hidden Markov model (HMM) profile has been proved to provide important clues for improving the prediction performance of DBPs. In this paper, we propose a method, called HMMPred, which extracts the features of amino acid composition and auto- and cross-covariance transformation from the HMM profiles, to help train a machine learning model for identification of DBPs. Then, a feature selection technique is performed based on the extreme gradient boosting (XGBoost) algorithm. Finally, the selected optimal features are fed into a support vector machine (SVM) classifier to predict DBPs. The experimental results tested on two benchmark datasets show that the proposed method is superior to most of the existing methods and could serve as an alternative tool to identify DBPs.
Highlights
DNA-binding proteins (DBPs), which can bind to and interact with DNA, play prominent roles in the structural composition of DNA and the regulation of genes
We propose a novel method, called HMMPred, which utilizes features extracted solely from the hidden Markov model (HMM) profile to further improve the prediction accuracy of DBPs
Features are extracted from the HMM profiles by fusing three techniques, i.e., amino acid composition (AAC), UniProt database
Summary
DNA-binding proteins (DBPs), which can bind to and interact with DNA, play prominent roles in the structural composition of DNA and the regulation of genes. These proteins have a variety of biochemical functions in the cell and molecular biology, including the participation and regulation of various cellular processes, such as transcription, DNA replication, recombination, modification, and repair [1, 2]. DBPs were normally identified by experimental techniques, such as filter binding assays, genetic analysis, Xray crystallography, ChIP-chip analysis, and nuclear magnetic resonance (NMR) [4]. With the rapid increase of protein sequence data, there is a great need to develop efficient computational methods to identify DBPs solely based on their primary sequences
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Computational and Mathematical Methods in Medicine
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.