Abstract

DNA-binding proteins play an important role in cell metabolism. In biological laboratories, the detection methods of DNA-binding proteins includes yeast one-hybrid methods, bacterial singles and X-ray crystallography methods and others, but these methods involve a lot of labor, material and time. In recent years, many computation-based approachs have been proposed to detect DNA-binding proteins. In this paper, a machine learning-based method, which is called the Fuzzy Kernel Ridge Regression model based on Multi-View Sequence Features (FKRR-MVSF), is proposed to identifying DNA-binding proteins. First of all, multi-view sequence features are extracted from protein sequences. Next, a Multiple Kernel Learning (MKL) algorithm is employed to combine multiple features. Finally, a Fuzzy Kernel Ridge Regression (FKRR) model is built to detect DNA-binding proteins. Compared with other methods, our model achieves good results. Our method obtains an accuracy of 83.26% and 81.72% on two benchmark datasets (PDB1075 and compared with PDB186), respectively.

Highlights

  • The interaction between DNA and protein exists in various tissues of the living body

  • We propose a novel model via a Fuzzy Kernel Ridge Regression model based on Multi-View Sequence Features (FKRR-MVSF) to predict DNA-binding proteins

  • To evaluate our proposed method (FKRR-MVSF), two benchmark datasets of DNA-binding proteins are employed in our study

Read more

Summary

Introduction

The interaction between DNA and protein exists in various tissues of the living body. The study of DNA binding residues in DNA–protein interactions facilitates a comprehensive understanding of the mechanisms of chromatin recombination and gene-regulated expression. Wet experiment-based methods are both time and money consuming. The protein information of 3D structures or their complexes is important for drug design. Lots of sequence-based information, such as PTM (posttranslational modification) sites in proteins [4,5,6,7,8,9], DNA-methylation sites [10], protein–drug interaction in cellular networking [11], protein–protein interactions [12] and recombination spots [13], have been predicted by sequential tools such as Pseudo Amino Acid Composition (PseAAC) [14] and Pseudo K-tuple Nucleotide Composition (PseKNC) approach [15]. Bioinformatics has played important roles in the development of novel drugs

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call