Abstract

We perform frequency-domain analysis in the genomes of various organisms using tricolor spectrograms, identifying several types of distinct visual patterns characterizing specific DNA regions. We relate patterns and their frequency characteristics to the sequence characteristics of the DNA. At times, the spectrogram patterns can be related to the structure of the corresponding protein region by using various public databases such as GenBank. Some patterns are explained from the biological nature of the corresponding regions, which relate to chromosome structure and protein coding, and some patterns have yet unknown biological significance. We found biologically meaningful patterns, on the scale of millions of base pairs, to a few hundred base pairs. Chromosome-wide patterns include periodicities ranging from 2 to 300. The color of the spectrogram depends on the nucleotide content at specific frequencies, and therefore can be used as a local indicator of CG content and other measures of relative base content. Several smaller-scale patterns are found to represent different types of domains made up of various tandem repeats.

Highlights

  • Color spectrograms of biomolecular sequences were introduced in [1, 2] as visualization tools providing information about the local nature of DNA stretches. These spectrograms give a simultaneous view of the local frequency throughout the nucleotide sequence, as well as the local nucleotide content indicated by the color of the spectrogram

  • The parameters of the short time Fourier transform (STFT) are very important in visualization; we initially experimented these parameters with different discrete Fourier transform (DFT) window sizes for the spectrogram

  • It should be noted that a quilt appears as a quilt and not as a bar because the DFT window size used to create these spectrograms is smaller than the base repeat unit length (135 bp in this case)

Read more

Summary

INTRODUCTION

Color spectrograms of biomolecular sequences were introduced in [1, 2] as visualization tools providing information about the local nature of DNA stretches. These spectrograms give a simultaneous view of the local frequency throughout the nucleotide sequence, as well as the local nucleotide content indicated by the color of the spectrogram They are helpful for the identification of genes and other regions of known biological significance, and for the discovery of yet unknown regions of potential significance, characterized by distinct visual patterns in the spectrogram that are not detectable by character string analysis. The difficulty in creating DNA spectrograms results from the fact that DNA sequences are defined by character strings rather than numerical sequences This problem can be solved by considering the binary indicator sequences uA[n], uT [n], uC[n], and uG[n], taking the value of either one or zero depending on whether or not the corresponding character exists at location n. We demonstrate that even in complicated sequences, A is mapped by the color blue, T by red, C by green, and G by yellow

CHROMOSOME-WIDE PATTERNS
Human chromosome 22
SMALL PATTERNS
Minisatellites
Quilts—yeast flocculation genes
Bars—the Y -helicases
Shafts and their structural significance
An unannotated pattern
DISCUSSION AND CONCLUSIONS
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.