KK-DBP: A Multi-Feature Fusion Method for DNA-Binding Protein Identification Based on Random Forest.

Yuran Jia,Shan Huang,Tianjiao Zhang

doi:10.3389/fgene.2021.811158

Abstract

DNA-binding protein (DBP) is a protein with a special DNA binding domain that is associated with many important molecular biological mechanisms. Rapid development of computational methods has made it possible to predict DBP on a large scale; however, existing methods do not fully integrate DBP-related features, resulting in rough prediction results. In this article, we develop a DNA-binding protein identification method called KK-DBP. To improve prediction accuracy, we propose a feature extraction method that fuses multiple PSSM features. The experimental results show a prediction accuracy on the independent test dataset PDB186 of 81.22%, which is the highest of all existing methods.

Highlights

Proteins are spatially structured substances formed by the complex folding of amino acids into polypeptide chains through dehydration and condensation
We selected four different performance measures, accuracy (ACC), specificity (SP), sensitivity (SN) and Matthew’s correlation coefficient (MCC), to evaluate the methodology used by this study to demonstrate the predictive ability of the model used (Wei et al, 2014; Wei et al, 2017b; Manavalan et al, 2019a; Manavalan et al, 2019b; Jin et al, 2019; Su et al, 2019; Li et al, 2020a; Liu et al, 2020a; Ao et al, 2020; Li et al, 2020b; Zhang et al, 2020b; Yu et al, 2020; Zhao et al, 2020; Wang et al, 2021c; Zhu et al, 2021)
Performance of Different Features on Training Set PDB1075 A large amount of information on homologous proteins is contained in evolutionarily informative features based on the position specificity score matrix (PSSM) matrix

Summary

INTRODUCTION

Proteins are spatially structured substances formed by the complex folding of amino acids into polypeptide chains through dehydration and condensation. IDNAPro-PseAAC (Liu et al, 2015), which uses a similar feature extraction method, adopts a prediction model based on a support vector machine to predict DBP. A number of DNA-binding protein prediction methods based on different strategies exist Most of these DBP prediction methods fail to extract features based on evolutionary information, so their robustness and prediction accuracy have much room for improvement. When given a protein sequence, BLAST can represent the evolutionary information of a protein by aligning it with data in a specific database and extracting a position specific score matrix (PSSM). Because each protein sequence in the dataset will consist of the pseudo composition of all of its dipeptides, we can generate a 110-dimensional vector feature of RPSSM, defined as follows:.

RESULTS

Experimental Results and Analysis

Methods

DISCUSSION AND CONCLUSION

DATA AVAILABILITY STATEMENT

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Genetics	Publication Date: Nov 29, 2021
Citations: 13	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

KK-DBP: A Multi-Feature Fusion Method for DNA-Binding Protein Identification Based on Random Forest.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Genetics

Lead the way for us

Similar Papers

DNA-Prot: Identification of DNA Binding Proteins from Protein Sequence Information using Random Forest
K Krishna Kumar ... P N Suganthan
Journal of Biomolecular Structure and Dynamics | VOL. 26
K Krishna Kumar, et. al.K Krishna Kumar ... P N Suganthan
01 Jun 2009
Journal of Biomolecular Structure and Dynamics | VOL. 26

Identification of DNA-binding proteins using multi-features fusion and binary firefly optimization algorithm.
Jian Zhang ... Guifu Yang
BMC Bioinformatics | VOL. 17
Jian Zhang, et. al.Jian Zhang ... Guifu Yang
26 Aug 2016
BMC Bioinformatics | VOL. 17

Rapid detection and purification of sequence specific DNA binding proteins using magnetic separation
Marija Mojsin ... Danijela Drakulic
Journal of the Serbian Chemical Society | VOL. 71
Marija Mojsin, et. al.Marija Mojsin ... Danijela Drakulic
01 Jan 2006
Journal of the Serbian Chemical Society | VOL. 71

Identification of DNA binding proteins using evolutionary profiles position specific scoring matrix
Muhammad Waris ... Maqsood Hayat
Neurocomputing | VOL. 199
Muhammad Waris, et. al.Muhammad Waris ... Maqsood Hayat
06 Apr 2016
Neurocomputing | VOL. 199

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

KK-DBP: A Multi-Feature Fusion Method for DNA-Binding Protein Identification Based on Random Forest.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Genetics