Abstract

Development of efficient gene prediction algorithms is one of the fundamental efforts in gene prediction study in the area of genomics. In genomic signal processing the basic step of the identification of protein coding regions in DNA sequences is based on the period-3 property exhibited by nucleotides in exons. Several approaches based on signal processing tools and numerical representations have been applied to solve this problem, trying to achieve more accurate predictions. This paper presents a new indicator sequence based on amino acid sequence, called as aminoacid indicator sequence, derived from DNA string that uses the existing signal processing based time-domain and frequency domain methods to predict these regions within the billions long DNA sequence of eukaryotic cells which reduces the computational load by one-third. It is known that each triplet of bases, called as codon, instructs the cell machinery to synthesize an amino acid. The codon sequence therefore uniquely identifies an amino acid sequence which defines a protein. Thus the protein coding region is attributed by the codons in amino acid sequence. This property is used for detection of period-3 regions using amino acid sequence. Physico-chemical properties of amino acids are used for numerical representation. Various accuracy measures such as exonic peaks, discriminating factor, sensitivity, specificity, miss rate, wrong rate and approximate correlation are used to demonstrate the efficacy of the proposed predictor. The proposed method is validated on various organisms using the standard data-set HMR195, Burset and Guigo and KEGG. The simulation result shows that the proposed method is an effective approach for protein coding prediction.

Highlights

  • Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an exponential growth of genomic sequences

  • This paper presents a new indicator sequence based on amino acid sequence, called as aminoacid indicator sequence, derived from DNA string that uses the existing signal processing based timedomain and frequency domain methods to predict these regions within the billions long DNA sequence of eukaryotic cells which reduces the computational load by one-third

  • The genetic information contained in DNA sequences, RNA sequences, and proteins is extracted in Genomic signal processing

Read more

Summary

INTRODUCTION

Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an exponential growth of genomic sequences. Quaternion [20], Gailos field assignment [21], EIIP [22, 23], paired numeric [14] to make indicator sequence in DSP methods to improve the accuracy of exons prediction Another four-indicator sequence called as relative frequency indicator sequence based on various coding statistics like single-nucleotide, dinucleotide and trinucleotide biases are incorporated into the algorithm to improve the selectivity and sensitivity of filter methods [24]. A new method to predict protein coding regions is developed in this paper based on the amino acid indicator sequence obtained from DNA string that exon sequences have a 3-base periodicity, while intron sequences do not have this unique feature. Section presents amino acid indicator sequence approach for identification of protein coding regions using Fourier transform and digital filter.

PROPOSED AMINO ACID INDICATOR SEQUENCE
Valine
AND DISCUSSION
D SN SP MR WR AC
Findings
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call