Abstract

Summary form only given. When a gene finding algorithm incorporates multiple useful and non-redundant sources of information about coding regions, it becomes more successful. It is thus highly desirable to find new and efficient codon indices. Here we propose a novel codon index, which we call the period-3 fractal deviation (PFD). This is obtained by simultaneously considering two incompatible features of DNA sequences, the period-3 feature in coding regions and the fractal feature in both coding and non-coding regions. These two features are incompatible because period-3 defines a specific scale of three nucleotide bases while fractal means there are not any specific scales. The PFD is very different for coding and non-coding sequences, and is reading-frame-dependent. The accuracy of the PFD is evaluated by studying all of the 16 yeast chromosomes. It is found that the percentage accuracy is very high and quite independent of the sliding window size. It is also found that this percentage accuracy is much higher than when period-3 and fractal features are characterized alone, especially when the window size is small. This highly suggests that the method is not only useful for the study of long genome sequences, but may also be very powerful for the study of short DNA segments. The PFD is complementary to other codon indices, including Fourier measures of period-3. This makes it possible to integrate PFD with other measures. Indeed, integration of the PFD measure with those indices using the Fisher linear discriminant analysis significantly improves the accuracy of protein coding sequence identification; This implies the measure proposed here may be readily incorporated into existing gene finding algorithms. Other salient features of the method is that it is non-parametric, does not require training, and can be fully automated.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.