Abstract
It has been observed that the protein-coding regions of DNA sequences exhibit period-three behaviour, which can be exploited to predict the location of coding regions within genes. Previously, discrete Fourier transform (DFT) and digital filter-based methods have been used for the identification of coding regions. However, these methods do not significantly suppress the noncoding regions in the DNA spectrum at . Consequently, a noncoding region may inadvertently be identified as a coding region. This paper introduces a new technique (a single digital filter operation followed by a quadratic window operation) that suppresses nearly all of the noncoding regions. The proposed method therefore improves the likelihood of correctly identifying coding regions in such genes.
Highlights
Finding coding regions in a DNA strand involves searching amongst the many nucleotides that comprise a DNA strand
The DNA sequence representing a DNA strand consists of the letters A, T, C, and G listed in a left-to-right fashion corresponding to the nucleotides that make up the strand arranged left to right from their 5 to 3 ends [1]
Previous digital signal processing (DSP) methods for the identification of coding regions in DNA sequences include the application of the discrete Fourier transform (DFT) on overlapping windows [1, 3, 4] and the application of bandpass digital filters that are centered at 2π/3 [2, 6]
Summary
Finding coding regions (exons) in a DNA strand involves searching amongst the many nucleotides that comprise a DNA strand. Previous digital signal processing (DSP) methods for the identification of coding regions (exons) in DNA sequences include the application of the discrete Fourier transform (DFT) on overlapping windows [1, 3, 4] and the application of bandpass digital filters that are centered at 2π/3 [2, 6]. Computational methods that exploit the heterogeneous statistical properties of DNA sequences to recursively segment homogeneous subsequences from their heterogeneous supersequences can be used for the identification of the borders between coding and noncoding regions [11, 12, 13]. Previous DSP methods that exploit period-three behaviour do not entirely suppress the noncoding regions in the DNA spectrum at 2π/3.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have