Abstract

BackgroundIdentification of DNA-binding proteins is one of the major challenges in the field of genome annotation, as these proteins play a crucial role in gene-regulation. In this paper, we developed various SVM modules for predicting DNA-binding domains and proteins. All models were trained and tested on multiple datasets of non-redundant proteins.ResultsSVM models have been developed on DNAaset, which consists of 1153 DNA-binding and equal number of non DNA-binding proteins, and achieved the maximum accuracy of 72.42% and 71.59% using amino acid and dipeptide compositions, respectively. The performance of SVM model improved from 72.42% to 74.22%, when evolutionary information in form of PSSM profiles was used as input instead of amino acid composition. In addition, SVM models have been developed on DNAset, which consists of 146 DNA-binding and 250 non-binding chains/domains, and achieved the maximum accuracy of 79.80% and 86.62% using amino acid composition and PSSM profiles. The SVM models developed in this study perform better than existing methods on a blind dataset.ConclusionA highly accurate method has been developed for predicting DNA-binding proteins using SVM and PSSM profiles. This is the first study in which evolutionary information in form of PSSM profiles has been used successfully for predicting DNA-binding proteins. A web-server DNAbinder has been developed for identifying DNA-binding proteins and domains from query amino acid sequences .

Highlights

  • Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation, as these proteins play a crucial role in gene-regulation

  • Support vector machine (SVM) models Prediction of DNA-binding domains/chains SVM models have been developed on DNAset or main dataset, which has DNA-binding and non-binding chains obtained from PDB

  • It has been well documented that evolutionary information in form of position specific scoring matrix (PSSM) profiles provides more information, which significantly improved the accuracy of prediction in several studies, such as RNA binding sites, subcellular localization, β-turns etc [9,10,11,12,13]

Read more

Summary

Introduction

Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation, as these proteins play a crucial role in gene-regulation. DNA-binding proteins (DNA-BPs) are very important constituent of both eukaryotic and prokaryotic proteomes. It has been reported that approximately 2–3% of prokaryotic and 6–7% of eukaryotic proteins can bind to DNA [1,2]. These proteins play important roles in DNA packaging, replication, transcription regulation and other activities associated with DNA. In the form of restriction enzymes, DNA-BPs play a crucial role in prokaryotic host defence. Identification of DNA-BPs can play a vital role in proteome annotation and understanding an important class of proteins

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call