Computationally identifying hot spots in protein-DNA binding interfaces using an ensemble approach

Yuliang Pan,Jihong Guan,Shuigeng Zhou

doi:10.1186/s12859-020-03675-3

Yuliang Pan, Jihong Guan + Show 1 more

Open Access

https://doi.org/10.1186/s12859-020-03675-3

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Sep 1, 2020
Citations: 19	License type: open-access

Affiliation: Tongji University, Fudan University

Abstract

BackgroundProtein-DNA interaction governs a large number of cellular processes, and it can be altered by a small fraction of interface residues, i.e., the so-called hot spots, which account for most of the interface binding free energy. Accurate prediction of hot spots is critical to understand the principle of protein-DNA interactions. There are already some computational methods that can accurately and efficiently predict a large number of hot residues. However, the insufficiency of experimentally validated hot-spot residues in protein-DNA complexes and the low diversity of the employed features limit the performance of existing methods.ResultsHere, we report a new computational method for effectively predicting hot spots in protein-DNA binding interfaces. This method, called PreHots (the abbreviation of Predicting Hotspots), adopts an ensemble stacking classifier that integrates different machine learning classifiers to generate a robust model with 19 features selected by a sequential backward feature selection algorithm. To this end, we constructed two new and reliable datasets (one benchmark for model training and one independent dataset for validation), which totally consist of 123 hot spots and 137 non-hot spots from 89 protein-DNA complexes. The data were manually collected from the literature and existing databases with a strict process of redundancy removal. Our method achieves a sensitivity of 0.813 and an AUC score of 0.868 in 10-fold cross-validation on the benchmark dataset, and a sensitivity of 0.818 and an AUC score of 0.820 on the independent test dataset. The results show that our approach outperforms the existing ones.ConclusionsPreHots, which is based on stack ensemble of boosting algorithms, can reliably predict hot spots at the protein-DNA binding interface on a large scale. Compared with the existing methods, PreHots can achieve better prediction performance. Both the webserver of PreHots and the datasets are freely available at: http://dmb.tongji.edu.cn/tools/PreHots/.

Highlights

Protein-DNA interaction governs a large number of cellular processes, and it can be altered by a small fraction of interface residues, i.e., the so-called hot spots, which account for most of the interface binding free energy
DNA-protein binding interfaces contain a large number of residues, the associations between DNA and proteins are governed by a small fraction of residues with high binding affinity, which are called hot spots
We develop a novel computational approach The abbreviation of predicting hot spots (PreHots), which is based on stack ensemble of boosting algorithms, for effectively predicting hot spots in protein-DNA binding interfaces

Summary

Introduction

Protein-DNA interaction governs a large number of cellular processes, and it can be altered by a small fraction of interface residues, i.e., the so-called hot spots, which account for most of the interface binding free energy. Accurate prediction of hot spots is critical to understand the principle of protein-DNA interactions. There are already some computational methods that can accurately and efficiently predict a large number of hot residues. The interactions of proteins and DNA are essential for many crucial cellular processes, including gene expression and regulation, DNA replication and repair. DNA-protein binding interfaces contain a large number of residues, the associations between DNA and proteins are governed by a small fraction of residues with high binding affinity, which are called hot spots. Accurate identification of hot spots is important to understand molecular regulation mechanisms and provide solutions to disease diagnosis and treatment [4]

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Computationally identifying hot spots in protein-DNA binding interfaces using an ensemble approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Computational identification of binding energy hot spots in protein-RNA complexes using an ensemble approach.
Yuliang Pan ... Zixiang Wang
Bioinformatics | VOL. 34
Yuliang Pan, et. al.Yuliang Pan ... Zixiang Wang
21 Dec 2017
Bioinformatics | VOL. 34

A feature-based approach to predict hot spots in protein-DNA binding interfaces.
Sijia Zhang ... Junfeng Xia
Briefings in Bioinformatics | VOL. 21
Sijia Zhang, et. al.Sijia Zhang ... Junfeng Xia
08 Apr 2019
Briefings in Bioinformatics | VOL. 21

An improved feature selection approach for chronic heart disease detection
S J Sushma ... Tsehay Admassu Assegie
Bulletin of Electrical Engineering and Informatics | VOL. 10
S J Sushma, et. al.S J Sushma ... Tsehay Admassu Assegie
01 Dec 2021
Bulletin of Electrical Engineering and Informatics | VOL. 10

Pre-determination of OSA degree using morphological features of the ECG signal
Şule Yücelbaş ... Şebnem Yosunkaya
Expert Systems With Applications | VOL. 81
Şule Yücelbaş, et. al.Şule Yücelbaş ... Şebnem Yosunkaya
23 Mar 2017
Expert Systems With Applications | VOL. 81

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Computationally identifying hot spots in protein-DNA binding interfaces using an ensemble approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics