Abstract

The concept of the spectral envelope for analyzing periodicities in categorical-valued time series was introduced in the statistics literature (Stoffer et al., 1993a) as a computationally simple and general statistical methodology for the harmonic analysis and scaling of non-numeric sequences. In the process of developing the technology, many possible interesting adaptations became apparent; for example, Stoffer and Tyler (1998) consider the maximal squared coherency between two categorical-valued time series. One of the most interesting directions was the use of the technology in the analysis of long DNA sequences. A benefit of the techniques was that it combined rigorous statistical analysis with modern computer power to quickly search for diagnostic patterns within long DNA sequences. The methodology is closely related to frequency domain principal component analysis and canonical correlation analysis of time series, and consequently, these topics are described and summarized in the appendix. In addition to presenting the theory and methods of the spectral envelope and related techniques, various analyses of DNA sequences are included. The investigations focus primarily, but not exclusively, on the analysis of viruses. The problems addressed concern about period lengths in nucleosome positioning signals, optimal alphabets in codon usage, and sequence alignment.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call