Identification Of Protein Coding Regions Research Articles

The identification of protein coding regions is a major topic of research in the field of gene prediction. A number of digital signal processing (DSP) based approaches, which exploit 3-base periodicity to detect coding regions, have been proposed. According to these previously published approaches, we summarize that an effective method or filter for identifying protein coding regions should fulfill three important properties, including the independence of the window length, an effective and adaptive frequency response, a fixed basic frequency of 1∕3f. However, most of published approaches cannot simultaneously satisfy these three points, which causes that their identification accuracy is still limited. In this paper, we propose an adaptive signal processing method, called sinusoidal-assisted variational mode decomposition (SAVMD) for identifying coding regions. The adaptability of SAVMD reflects in two aspects including: (i) The proposed method analyzes numerical sequences without needing any window information; (ii) The spectrum of period-3 component can be automatically fitted by SAVMD in Fourier domain. From this, our proposed method outperforms other DSP-based methods in terms of identification accuracy, which is verified by the experimental results on five benchmark datasets. When processing the dataset where most sequences contain undetermined nucleotides (UDT), SAVMD shows more superior performance than the model-dependent method AUGUSTUS as well as other model-independent methods. In addition, we conduct a comparative analysis on different numerical conversions of DNA sequences using SAVMD. Several applicable methods for SAVMD, which are selected from this experimentation, can provide a reference to the applications of other time–frequency decomposition methods in the field of gene prediction.

BackgroundThe correct protein coding region identification is an important and latent problem in the molecular biology field. This problem becomes a challenge due to the lack of deep knowledge about the biological systems and unfamiliarity of conservative characteristics in the messenger RNA (mRNA). Therefore, it is fundamental to research for computational methods aiming to help the patterns discovery for identification of the Translation Initiation Sites (TIS). In the field of Bioinformatics, machine learning methods have been widely applied based on the inductive inference, as Inductive Support Vector Machine (ISVM). On the other hand, not so much attention has been given to transductive inference-based machine learning methods such as Transductive Support Vector Machine (TSVM). The transductive inference performs well for problems in which the amount of unlabeled sequences is considerably greater than the labeled ones. Similarly, the problem of predicting the TIS may take advantage of transductive methods due to the fact that the amount of new sequences grows rapidly with the progress of Genome Project that allows the study of new organisms. Consequently, this work aims to investigate the transductive learning towards TIS identification and compare the results with those obtained in inductive method.ResultsThe transductive inference presents better results both in F-measure and in sensitivity in comparison with the inductive method for predicting the TIS. Additionally, it presents the least failure rate for identifying the TIS, presenting a smaller number of False Negatives (FN) than the ISVM. The ISVM and TSVM methods were validated with the molecules from the most representative organisms contained in the RefSeq database: Rattus norvegicus, Mus musculus, Homo sapiens, Drosophila melanogaster and Arabidopsis thaliana. The transductive method presented F-measure and sensitivity higher than 90% and also higher than the results obtained with ISVM. The ISVM and TSVM approaches were implemented in the TransduTIS tool, TransduTIS-I and TransduTIS-T respectively, available in a web interface. These approaches were compared with the TISHunter, TIS Miner, NetStart tools, presenting satisfactory results.ConclusionsIn relation to precision, the results are similar for the ISVM and TSVM classifiers. However, the results show that the application of TSVM approach ensured an improvement, specially for F-measure and sensitivity. Moreover, it was possible to identify a potential for the application of TSVM, which is for organisms in the initial study phase with few identified sequences in the databases.

Identification Of Protein Coding Regions Research Articles

Articles published on Identification Of Protein Coding Regions

Optimization method of protein coding region identification based on IHHO-CNN-LSTM

An efficient way of identification of protein coding regions of eukaryotic genes using digital FIR filter governed by Ramanujan's Sum

Bidirectional filtering approach for the improved protein coding region identification in eukaryotes

An efficient way of identification of protein coding regions of Eukaryotic genes using digital FIR filter governed by Ramanujans Sum

SAVMD: An adaptive signal processing method for identifying protein coding regions

DSP techniques for protein coding region identification based on background noise and nonlinear phase delay reduction from period-3 spectrum using zero phased anti-notch filter and Savitzky-Golay (S-G) filter

DSP techniques for protein coding region identification based on background noise and nonlinear phase delay reduction from period-3 spectrum using zero phased anti-notch filter and Savitzky-Golay (S-G) filter

A tri-nucleotide mapping scheme based on residual volume of amino acids for short length exon prediction using sliding window DFT method

A new numerical approach for DNA representation using modified Gabor wavelet transform for the identification of protein coding regions

Walsh code based numerical mapping method for the identification of protein coding regions in eukaryotes

A novel numerical mapping method based on entropy for digitizing DNA sequences

Transductive learning as an alternative to translation initiation site identification

Identification of protein coding regions in RNA transcripts.

Improved Identification of Protein Coding Region using Wavelet Transform

Computational Study and Performance Evaluation of Different Genomic Signal Processing Methods for Identification of Protein Coding Regions (Exon Regions) of DNA Sequence

Identification of Protein Coding Regions in the Eukaryotic DNA Sequences Based on Marple Algorithm and Wavelet Packets Transform

A Punctual Algorithm for Small Gene Prediction in DNA Sequences Using a Time-Frequency Approach based on the Z-Curve

A Novel Fast Algorithm for Exon Prediction in Eukaryotic Genes using Linear Predictive Coding Model and Goertzel Algorithm based on the Z-Curve

Fast Algorithmfor Identifying Protein-Coding Regions

Identification of protein coding regions using antinotch filters

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Identification Of Protein Coding Regions Research Articles

Articles published on Identification Of Protein Coding Regions

Optimization method of protein coding region identification based on IHHO-CNN-LSTM

An efficient way of identification of protein coding regions of eukaryotic genes using digital FIR filter governed by Ramanujan's Sum

Bidirectional filtering approach for the improved protein coding region identification in eukaryotes

An efficient way of identification of protein coding regions of Eukaryotic genes using digital FIR filter governed by Ramanujans Sum

SAVMD: An adaptive signal processing method for identifying protein coding regions

DSP techniques for protein coding region identification based on background noise and nonlinear phase delay reduction from period-3 spectrum using zero phased anti-notch filter and Savitzky-Golay (S-G) filter

DSP techniques for protein coding region identification based on background noise and nonlinear phase delay reduction from period-3 spectrum using zero phased anti-notch filter and Savitzky-Golay (S-G) filter

A tri-nucleotide mapping scheme based on residual volume of amino acids for short length exon prediction using sliding window DFT method

A new numerical approach for DNA representation using modified Gabor wavelet transform for the identification of protein coding regions

Walsh code based numerical mapping method for the identification of protein coding regions in eukaryotes

A novel numerical mapping method based on entropy for digitizing DNA sequences

Transductive learning as an alternative to translation initiation site identification

Identification of protein coding regions in RNA transcripts.

Improved Identification of Protein Coding Region using Wavelet Transform

Computational Study and Performance Evaluation of Different Genomic Signal Processing Methods for Identification of Protein Coding Regions (Exon Regions) of DNA Sequence

Identification of Protein Coding Regions in the Eukaryotic DNA Sequences Based on Marple Algorithm and Wavelet Packets Transform

A Punctual Algorithm for Small Gene Prediction in DNA Sequences Using a Time-Frequency Approach based on the Z-Curve

A Novel Fast Algorithm for Exon Prediction in Eukaryotic Genes using Linear Predictive Coding Model and Goertzel Algorithm based on the Z-Curve

Fast Algorithmfor Identifying Protein-Coding Regions

Identification of protein coding regions using antinotch filters

An efficient way of identification of protein coding regions of Eukaryotic genes using digital FIR filter governed by Ramanujans Sum