Abstract

In this paper we explore the free energy distribution in the helical form of DNA using the genome of the virus Rickettsia prowazekii Madrid E as an example. The genome of this organism has been determined by Andersson et al. (Nature 396 (1998) 133) and is available on the World Wide Web (www.tigr.org). Using the helix statistical weights based on nearest-neighbor base pairs of SantaLucia (Proc. Natl. Acad. Sci. USA 95 (1998) 1460), we calculate the free energy in consecutive blocks of m base pairs in the DNA sequence and then construct the free energy distribution for these values. Using the maximum-entropy method we can fit the distribution curves with a function based on the moments of the distribution. For blocks containing 10–20 base pairs the distribution is slightly skewed and we require four moments to accurately fit the function. For blocks containing 100 base pairs or more, the distribution is well approximated by a Gaussian function based on the first two moments of the distribution. We find that the free energy distribution for m=20 can be reproduced using random sequences that have the local (singlet, doublet or triplet) statistics of Rickettsia. However, for much larger blocks, for example m=500, the width of the free energy distribution based on the actual Rickettsia genome is broader by almost a factor of 3 than the distributions based on random local statistics. We find that the distribution functions for the C or G content in blocks of m base pairs have almost the same behavior as a function of block size as do the free energy distributions. In order to duplicate the width of the distribution functions based on the actual Rickettsia sequence, we need to introduce tables (matrices) that correlate the states of consecutive blocks hundreds of base pairs long. This indicates that correlations on the order of the number of base pairs contained in the average gene are required to give the actual widths for either the C or G content or the helix free energy distributions. Above a certain m value, the distributions for larger m can be accurately expressed in terms of the distribution functions for smaller m. Thus, for example, the distribution for m=5000 can be expressed in terms of the generating function for m=1000.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call