Abstract

Previous searches for long-range correlations in DNA sequences was carried out using statistical tools for stationary signals. However, genomic signals are non-stationary as can be attested by standard statistical tests for stationarity. In this paper, we address, in the light of non-stationary time-series analysis, the questions of (i) the existence of long-range correlations in DNA sequences and (ii) whether they are present in both coding and non-coding segments or only in the latter. It turns out that the statistical differences between coding and non-coding segments are more subtle than previously claimed by the stationary analysis. Both coding and non-coding sequences exhibit long-range correlations, as asserted by an evolutionary 1/f spectrum (i.e., having a time-dependent spectral exponent). Moreover, the average spectral exponent of non-coding segments is higher than its counterpart for coding segments. To prove that this observation is not an artifact of the 1/f evolutionary spectrum, we show, using an index of randomness that we derive from the frequency-time distribution of the genomic signals, that coding sequences are "more random" (i.e., whiter) than non-coding sequences. We believe that this result is likely the source of confusion and controversy in previous work, which relied on stationary analysis of DNA correlations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call