Abstract

DNA-binding protein (DBP) is a protein with a special DNA binding domain that is associated with many important molecular biological mechanisms. Rapid development of computational methods has made it possible to predict DBP on a large scale; however, existing methods do not fully integrate DBP-related features, resulting in rough prediction results. In this article, we develop a DNA-binding protein identification method called KK-DBP. To improve prediction accuracy, we propose a feature extraction method that fuses multiple PSSM features. The experimental results show a prediction accuracy on the independent test dataset PDB186 of 81.22%, which is the highest of all existing methods.

Highlights

  • Proteins are spatially structured substances formed by the complex folding of amino acids into polypeptide chains through dehydration and condensation

  • We selected four different performance measures, accuracy (ACC), specificity (SP), sensitivity (SN) and Matthew’s correlation coefficient (MCC), to evaluate the methodology used by this study to demonstrate the predictive ability of the model used (Wei et al, 2014; Wei et al, 2017b; Manavalan et al, 2019a; Manavalan et al, 2019b; Jin et al, 2019; Su et al, 2019; Li et al, 2020a; Liu et al, 2020a; Ao et al, 2020; Li et al, 2020b; Zhang et al, 2020b; Yu et al, 2020; Zhao et al, 2020; Wang et al, 2021c; Zhu et al, 2021)

  • Performance of Different Features on Training Set PDB1075 A large amount of information on homologous proteins is contained in evolutionarily informative features based on the position specificity score matrix (PSSM) matrix

Read more

Summary

INTRODUCTION

Proteins are spatially structured substances formed by the complex folding of amino acids into polypeptide chains through dehydration and condensation. IDNAPro-PseAAC (Liu et al, 2015), which uses a similar feature extraction method, adopts a prediction model based on a support vector machine to predict DBP. A number of DNA-binding protein prediction methods based on different strategies exist Most of these DBP prediction methods fail to extract features based on evolutionary information, so their robustness and prediction accuracy have much room for improvement. When given a protein sequence, BLAST can represent the evolutionary information of a protein by aligning it with data in a specific database and extracting a position specific score matrix (PSSM). Because each protein sequence in the dataset will consist of the pseudo composition of all of its dipeptides, we can generate a 110-dimensional vector feature of RPSSM, defined as follows:.

RESULTS
Experimental Results and Analysis
Methods
DISCUSSION AND CONCLUSION
DATA AVAILABILITY STATEMENT
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call