Abstract

The detection of short exons is a challenging open problem in the field of bioinformatics. Due to the fact that the weakness of existing model-independent methods lies in their inability to reliably detect small exons, a model-independent method based on the singularity detection with wavelet transform modulus maxima has been developed for detecting short coding sequences (exons) in eukaryotic DNA sequences. In the analysis of our method, the local maxima can capture and characterize singularities of short exons, which helps to yield significant patterns that are rarely observed with the traditional methods. In order to get some information about singularities on the differences between the exon signal and the background noise, the noise level is estimated by filtering the genomic sequence through a notch filter. Meanwhile, a fast method based on a piecewise cubic Hermite interpolating polynomial is applied to reconstruct the wavelet coefficients for improving the computational efficiency. In addition, the output measure of a paired-numerical representation calculated in both forward and reverse directions is used to incorporate a useful DNA structural property. The performances of our approach and other techniques are evaluated on two benchmark data sets. Experimental results demonstrate that the proposed method outperforms all assessed model-independent methods for detecting short exons in terms of evaluation metrics.

Highlights

  • As an initial step in the analysis of eukaryotic genome sequences, detecting exons would lead to a good understanding of the structure and function of a protein that is synthesized by these exons [1, 2]

  • We have evaluated the general performances of three other methods: exon prediction by nucleotide distribution (EPND) [25], modified Gabor-wavelet transform (MGWT) [26] and fast Fourier transform plus empirical mode decomposition (FFTEMD) [27]

  • In order to investigate the performance of MGWT on short exon detection, the operation on MGWT with window length 1200 points and scale values exponentially separated between 0.1 and 0.7 is denoted by MGWT I, while MGWT II denotes the Dataset HMR195 BG570

Read more

Summary

Introduction

As an initial step in the analysis of eukaryotic genome sequences, detecting exons would lead to a good understanding of the structure and function of a protein that is synthesized by these exons [1, 2]. In the past twenty years or so, many algorithms have been proposed for exon detection and good detection rate has been achieved in the recognition of exon and intron regions [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33]. Short Exon Detection analysis, decision to publish, or preparation of the manuscript

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call