Abstract

Multiscale signal processing techniques such as wavelet filtering have proved to be particularly successful in predicting exon sequences. Traditional wavelet predictor is domain filtering, and enforces exon features by weighting nucleotide values with coefficients. Such a measure performs linear filtering and is not suitable for preserving the short coding exons and the exon-intron boundaries. This paper describes a prediction framework that is capable of non-linearly processing DNA sequences while achieving high prediction rates. There are two key contributions. The first is the introduction of a genomic-inspired multiscale bilateral filtering (MSBF) which exploits both weighting coefficients in the spatial domain and nucleotide similarity in the range. Similarly to wavelet transform, the MSBF is also defined as a weighted sum of nucleotides. The difference is that the MSBF takes into account the variation of nucleotides at a specific codon position. The second contribution is the exploitation of inter-scale correlation in MSBF domain to find the inter-scale dependency on the differences between the exon signal and the background noise. This favourite property is used to sharp the important structures while weakening noise. Three benchmark data sets have been used in the evaluation of considered methods. By comparison with four existing techniques, the prediction results demonstrate that: the proposed method reveals at least improvement of 4.1%, 50.5%, 25.6%, 2.5%, 10.8%, 15.5%, 11.1%, 12.3%, 9.2% and 2.4% on the exons length of 1–24, 25–49, 50–74, 75–99, 100–124, 125–149, 150–174, 175–199, 200–299 and 300–300+, respectively. The MSBF of its nonlinear nature is good at energy compaction, which makes it capable of locating the sharp variations around short exons. The direct scale multiplication of coefficients at several adjacent scales obviously enhanced exon features while the noise contents were suppressed. We show that the non-linear nature and correlation-based property achieved in proposed predictor is greater than that for traditional filtering, which leads to better exon prediction performance. There are some possible applications of this predictor. Its good localization and protection of sharp variations will make the predictor be suitable to perform fault diagnosis of aero-engine.

Highlights

  • Recent advancement in high-throughput analysis, such as next-generation sequencing, has resulted in the development of computational techniques for the rapid prediction of exons in DNA sequences

  • To evaluate the general performances of these measures, the three-base periodicity (TBP) data for each DNA sequence considered have been normalized with values between 0 and 1

  • For exon prediction, extracting the relevant features of short coding sequences is a major task because the subtle features of short exons are obscured by the strong presence of background noise

Read more

Summary

Introduction

Recent advancement in high-throughput analysis, such as next-generation sequencing, has resulted in the development of computational techniques for the rapid prediction of exons in DNA sequences. Two independent studies by Irimia et al [5] in Cell and by Li et al [6] in Genome Research defined one class of short exons called microexons and uncovered the features regulating the inclusion of these microexons. Irimia et al reveal that the regulation of microexons (defined as exons with lengths of 3–15 bp) is highly dynamic during neuronal differentiation and the inclusion of these microexons can modulate the function of interaction domains of proteins involved in neurogenesis [5]. The challenge of determining the lengths and locations of short exons urgently needs to be solved. We focus on the development of a spectral analysis technique for finding exons in eukaryotic DNA sequences, as described below

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.