Abstract

Identification of protein-coding regions with high accuracy in eukaryotic genomes is considered as a challenging task because these regions remain in non continuous fashion along the length of DNA sequences. Various frequency domain algorithms have been designed for the detection of protein-coding regions since the beginning of twentieth century. The basic functionality of frequency domain approaches is to convert the signal from one domain to another and consequently probability of loss of important information is quite high. In this paper modified periodicity spectrum based algorithm (MPSBA) is proposed for the identification of protein-coding regions in eukaryotic genomes. There is no domain transformation requirement in the proposed algorithm. The key contribution of proposed algorithm is optimization of the window length by varying between 27 to 351 in step size of 3 corresponding to maximum area under curve (AUC). For the testing of applicability of proposed algorithm, benchmark data sequence F56F11.4 & thereafter bigger data sets HMR195, and BG570 have been employed. The recent state of art algorithms have been compared with proposed algorithm for performance assessment. The results obtained reflect the superiority of proposed algorithm and its applicability to identify the protein-coding regions of short and big sizes as well.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call