Abstract

We present a method and web server for predicting DNA structural features in a high-throughput (HT) manner for massive sequence data. This approach provides the framework for the integration of DNA sequence and shape analyses in genome-wide studies. The HT methodology uses a sliding-window approach to mine DNA structural information obtained from Monte Carlo simulations. It requires only nucleotide sequence as input and instantly predicts multiple structural features of DNA (minor groove width, roll, propeller twist and helix twist). The results of rigorous validations of the HT predictions based on DNA structures solved by X-ray crystallography and NMR spectroscopy, hydroxyl radical cleavage data, statistical analysis and cross-validation, and molecular dynamics simulations provide strong confidence in this approach. The DNAshape web server is freely available at http://rohslab.cmb.usc.edu/DNAshape/.

Highlights

  • An increasing number of structural biology and genomics studies associate protein–DNA binding with the recognition of the three-dimensional DNA structure, or ‘DNA shape’ [1]

  • We successfully applied our Monte Carlo (MC) approach in various studies of protein–DNA recognition [3,7,11,18,19], to bring this to the genomic scale, we have recently developed the methodology for facilitating MC data in high-throughput (HT) studies of DNA shape [4,8,18]

  • We focused on the minor groove width (MGW) of the DNA binding sites of six proteins for which we previously established the importance of minor groove shape readout [1]

Read more

Summary

INTRODUCTION

An increasing number of structural biology and genomics studies associate protein–DNA binding with the recognition of the three-dimensional DNA structure, or ‘DNA shape’ [1]. Based on the many more highresolution structures that have been solved and analysed in recent years, it is apparent that longer DNA segments must be characterized to capture the sequence– structure degeneracy of DNA [1] Such structural information, which can be retrieved from X-ray crystallography or NMR spectroscopy data, ideally provides information on the three-dimensional structure of a DNA binding site prior to and after protein binding. Recent efforts to characterize the structures of all 136 unique tetranucleotides have used all-atom molecular dynamics (MD) simulations of either 136 dodecamers [12] or 39 duplexes of 18 base pairs (bp) in length [13] In both designs most tetranucleotides occur only in the context of a single sequence, which limits the ability for a statistically robust comparison of the simulation results with experimental data. Our HT method underlying the DNAshape web server can be used to predict DNA structural features of the entire yeast genome at nucleotide resolution in less than 1 min on a single processor

METHODOLOGY
Findings
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call