Identification of DNA-protein Binding Sites through Multi-Scale Local Average Blocks on Sequence Information.

Cong Shen,Yijie Ding,Fei Guo,Jijun Tang,Jian Song

doi:10.3390/molecules22122079

Abstract

DNA–protein interactions appear as pivotal roles in diverse biological procedures and are paramount for cell metabolism, while identifying them with computational means is a kind of prudent scenario in depleting in vitro and in vivo experimental charging. A variety of state-of-the-art investigations have been elucidated to improve the accuracy of the DNA–protein binding sites prediction. Nevertheless, structure-based approaches are limited under the condition without 3D information, and the predictive validity is still refinable. In this essay, we address a kind of competitive method called Multi-scale Local Average Blocks (MLAB) algorithm to solve this issue. Different from structure-based routes, MLAB exploits a strategy that not only extracts local evolutionary information from primary sequences, but also using predicts solvent accessibility. Moreover, the construction about predictors of DNA–protein binding sites wields an ensemble weighted sparse representation model with random under-sampling. To evaluate the performance of MLAB, we conduct comprehensive experiments of DNA–protein binding sites prediction. MLAB gives of , , and on PDNA-543, PDNA-41, PDNA-316 and PDNA-52 datasets, respectively. It shows that MLAB gains advantages by comparing with other outstanding methods. for our method is increased by at least , and on PDNA-543, PDNA-41 and PDNA-316 datasets, respectively.

Highlights

DNA–protein interactions exert a crucial influence on diverse biological processes and is primal for cell metabolism
We test our method on several DNA–protein binding sites datasets to evaluate the performance of our proposed approach, including PDNA-543, PDNA-41, PDNA-335, PDNA-52 and PDNA-316
We independently analyze the performance of binding site representations, such as Position Specific Scoring Matrix (PSSM), PSSM-Multi-scale Local Average Blocks (MLAB) and Predicted Solvent Accessibility (PSA)

Summary

Introduction

DNA–protein interactions exert a crucial influence on diverse biological processes and is primal for cell metabolism. There is no lack of time-consumption in silico methods. The experimental determination of binding sites is always difficult and is not readily feasible all the time. Forecasting by statistical learning, which had been riveted by a lot of academics conducting surveys on DNA–protein binding sites, established in the field of computational and molecular biology, should be taken for granted. Several computational methods, which had been developed to identify DNA-binding sites in proteins, were generally based on protein sequence, protein structure or through integrating the aforementioned information. Most of these investigations are the methods that depended on machine learning techniques

Methods

Results

Conclusion