Abstract

A novel hierarchy of coarse-grain, sequence-dependent, rigid-base models of B-form DNA in solution is introduced. The hierarchy depends on both the assumed range of energetic couplings, and the extent of sequence dependence of the model parameters. A significant feature of the models is that they exhibit the phenomenon of frustration: each base cannot simultaneously minimize the energy of all of its interactions. As a consequence, an arbitrary DNA oligomer has an intrinsic or pre-existing stress, with the level of this frustration dependent on the particular sequence of the oligomer. Attention is focussed on the particular model in the hierarchy that has nearest-neighbor interactions and dimer sequence dependence of the model parameters. For a Gaussian version of this model, a complete coarse-grain parameter set is estimated. The parameterized model allows, for an oligomer of arbitrary length and sequence, a simple and explicit construction of an approximation to the configuration-space equilibrium probability density function for the oligomer in solution. The training set leading to the coarse-grain parameter set is itself extracted from a recent and extensive database of a large number of independent, atomic-resolution molecular dynamics (MD) simulations of short DNA oligomers immersed in explicit solvent. The Kullback-Leibler divergence between probability density functions is used to make several quantitative assessments of our nearest-neighbor, dimer-dependent model, which is compared against others in the hierarchy to assess various assumptions pertaining both to the locality of the energetic couplings and to the level of sequence dependence of its parameters. It is also compared directly against all-atom MD simulation to assess its predictive capabilities. The results show that the nearest-neighbor, dimer-dependent model can successfully resolve sequence effects both within and between oligomers. For example, due to the presence of frustration, the model can successfully predict the nonlocal changes in the minimum energy configuration of an oligomer that are consequent upon a local change of sequence at the level of a single point mutation.

Highlights

  • The sequence-dependent curvature and flexibility of DNA in solution is of both biological and technological importance

  • We introduce a new hierarchy of models for predicting the relative position, orientation, and energetic coupling between every base in an oligomer of doublehelical, B-form DNA with arbitrary sequence, in solution under prescribed, standard, environmental conditions

  • We develop and implement a method for estimating a complete parameter set for our nearest-neighbor, dimerdependent model from atomic-resolution molecular dynamics (MD) data

Read more

Summary

INTRODUCTION

Basepairs is exploited to build pre-specified three-dimensional structures,[13–15] some of which can perform elementary operations and functions.[16–18] The sequence-dependent mechanical properties of the resulting structure offers a rich design landscape which has yet to be fully exploited. Our locally parameterized model predicts that the intrinsic or ground-state curvature of an oligomer depends nonlocally on its sequence, as has been observed in various detailed MD simulations.[20,21,64,65] The description of such nonlocal behavior using only local parameters is a feature unique to our rigid-base model. We explicitly implement the proposed parameter estimation method on an extensive database of MD time series produced by a consortium of groups,[19–21] complemented with additional, and compatible, time series data that we simulated to have training set oligomers with a sufficient diversity of sequences at the leading and trailing ends Both data sets comprise all-atom simulations, with explicit solvent and ions, of over 50 different oligomers in total, where each oligomer was either 12 or 18 basepairs long, with simulation times of 50–200 ns for each oligomer. The supplementary material[66] provides an extensive discussion of the necessary background material that is exploited in the main text, along with further comparisons of predicted and observed quantities for various oligomers

Configuration coordinates
Free energy
Configuration density
Nondimensionalization
Gaussian approximation
Probability density comparisons
Nearest-neighbor assumption
Sequence-dependence assumptions
Oligomer-based nearest-neighbor model
Dimer-based nearest-neighbor model
THE TRAINING DATA SET
Basic assumptions
The ABC data set
MD simulation protocol
Observed training set data
Kullback-Leibler scale
PARAMETER ESTIMATION
Oligomer-based fitting
Dimer-based fitting
Dimensionless
EXAMPLE OLIGOMERS FROM THE TRAINING SET
AN EXAMPLE OLIGOMER NOT FROM THE TRAINING SET
Findings
VIII. SUMMARY AND CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call