Abstract

The analysis of protein coding regions of DNA sequences is one of the most fundamental applications in bioinformatics. A number of model-independent approaches have been developed for differentiating between the protein-coding and non-protein-coding regions of DNA. However, these methods are often based on univariate analysis algorithms, which leads to the loss of joint information among four nucleotides of DNA. In this article, we introduce a method on basis of the noise-assisted multivariate empirical mode decomposition (NA-MEMD) and the modified Gabor-wavelet transform (MGWT). The NA-MEMD algorithm, as a multivariate analysis tool, is utilized to reconstruct the numerical analyzed sequence since it enables a matched-scale decomposition across all variables and eliminates the mode mixing. By virtues of NA-MEMD, the MGWT method achieves a stable improvement on the general identification performance. We compare our method with other Digital Signal Processing (DSP) methods on two representative DNA sequences and three benchmark datasets. The results reveal that our method can enhance the spectra of the analyzed sequences, and improve the robustness of MGWT to different DNA sequences, thus obtaining higher identification accuracies of protein coding regions over other applied methods. In addition, another comparative experiment with the model-dependent method (AUGUSTUS) on the recently proposed benchmark dataset G3PO verifies the superiority of model-independent methods (especially NA-MEMD-MGWT) for identifying coding regions of the poor-quality DNA sequences.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.