PDRLGB: precise DNA-binding residue prediction using a light gradient boosting machine

Lei Deng,Juan Pan,Wenyi Yang,Hui Liu,Xiaojie Xu,Chuyao Liu

doi:10.1186/s12859-018-2527-1

Lei Deng, Juan Pan + Show 4 more

Open Access

https://doi.org/10.1186/s12859-018-2527-1

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Dec 1, 2018
Citations: 41	License type: open-access

Affiliation: Central South University, Changzhou University

Abstract

BackgroundIdentifying specific residues for protein-DNA interactions are of considerable importance to better recognize the binding mechanism of protein-DNA complexes. Despite the fact that many computational DNA-binding residue prediction approaches have been developed, there is still significant room for improvement concerning overall performance and availability.ResultsHere, we present an efficient approach termed PDRLGB that uses a light gradient boosting machine (LightGBM) to predict binding residues in protein-DNA complexes. Initially, we extract a wide variety of 913 sequence and structure features with a sliding window of 11. Then, we apply the random forest algorithm to sort the features in descending order of importance and obtain the optimal subset of features using incremental feature selection. Based on the selected feature set, we use a light gradient boosting machine to build the prediction model for DNA-binding residues. Our PDRLGB method shows better overall predictive accuracy and relatively less training time than other widely used machine learning (ML) methods such as random forest (RF), Adaboost and support vector machine (SVM). We further compare PDRLGB with various existing approaches on the independent test datasets and show improvement in results over the existing state-of-the-art approaches.ConclusionsPDRLGB is an efficient approach to predict specific residues for protein-DNA interactions.

Highlights

Th protein-DNA interaction is one of the central issues in molecular biology and widely exists in various biological activities in living organisms, such as DNA replication, repair, and modification processes
A number of computational approaches have been focused on applying machine learning algorithms to build prediction models based on sequence and structural information
Our experiments show that PDRLGB significantly outperforms other state-of-the-art DNA-binding residue prediction approaches

Summary

Introduction

Th protein-DNA interaction is one of the central issues in molecular biology and widely exists in various biological activities in living organisms, such as DNA replication, repair, and modification processes. To understand the recognition mechanism of protein-DNA complexes, researchers often focus on protein-DNA binding sites especially the interface residues that bind DNA Experimental approach such as electrophoretic mobility shift assays (EMSAs) [1, 2], conventional chromatin immunoprecipitation (ChIP) [3], X-ray crystallography [4], PNA (peptide nucleic acid)-assisted identification of RNA binding proteins (PAIR) [5], and NMR spectroscopy [6] have been applied to expose the DNA binding amino acids. These laboratory methods are expensive and time-consuming. Despite the fact that many computational DNA-binding residue prediction approaches have been developed, there is still significant room for improvement concerning overall performance and availability

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

PDRLGB: precise DNA-binding residue prediction using a light gradient boosting machine

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

A comparison of selected machine learning classifiers in mapping a South African heterogeneous coastal zone: Testing the utility of an object-based classification with WorldView-2 imagery
Elhadi M I Adam ... Riyad Ismail
-
Elhadi M I Adam, et. al.Elhadi M I Adam ... Riyad Ismail
25 Oct 2012
25 Oct 2012

Identifying novel transcript biomarkers for hepatocellular carcinoma (HCC) using RNA-Seq datasets and machine learning
Rajinder Gupta ... Jos Kleinjans
BMC Cancer | VOL. 21
Rajinder Gupta, et. al.Rajinder Gupta ... Jos Kleinjans
27 Aug 2021
BMC Cancer | VOL. 21

Wheat leaf area index prediction using data fusion based on high-resolution unmanned aerial vehicle imagery
Shuang Wu ... Yanjie Wu
Plant Methods | VOL. 18
Shuang Wu, et. al.Shuang Wu ... Yanjie Wu
19 May 2022
Plant Methods | VOL. 18

Spaceborne GNSS-R for Sea Ice Classification Using Machine Learning Classifiers
Yongchao Zhu ... Lei Wang
Remote Sensing | VOL. 13
Yongchao Zhu, et. al.Yongchao Zhu ... Lei Wang
14 Nov 2021
Remote Sensing | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PDRLGB: precise DNA-binding residue prediction using a light gradient boosting machine

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics