Abstract

Abstract Given the increasing number of available genomic sequences, one now faces the task of identifying their protein coding regions. The gene prediction problem can be addressed in several ways, and one of the most promising methods makes use of information derived from the comparison of homologous sequences. In this work, we develop a new comparative-based gene prediction program, called Exon_Finder2. This tool is based on a new type of alignment we propose, called syntenic global alignment, that can deal satisfactorily with sequences that share regions with different rates of conservation. In addition to this new type of alignment itself, we also describe a dynamic programming algorithm that computes a best syntenic global alignment of two sequences, as well as its related score. The applicability of our approach was validated by the promising initial results achieved by Exon_Finder2. On a benchmark including 120 pairs of human and mouse genomic sequences, most of their encoded genes were successfully identified by our program.

Highlights

  • The gene prediction problem can be defined as the task of finding the genes encoded in a genomic sequence of interest

  • Given the increasing number of homologous sequences in the databases and the assumption that the exons tend to be more conserved than the introns inside a genome, comparative-based gene prediction programs start to be extensively used in the task of gene identification

  • In this work we presented a new gene prediction tool, whose implementation is based on a new type of alignment proposed by us and called syntenic global alignment

Read more

Summary

Introduction

The gene prediction problem can be defined as the task of finding the genes encoded in a genomic sequence of interest. In order to deal with sequences whose conserved regions are intervened by unconserved ones, such as protein and prokaryotic gene sequences, Huang and Chao proposed in [20] the generalized global alignment. This type of alignment discriminates between conserved and unconserved regions by using the concept of difference blocks. This happens, for example, when the sequences to be compared include highly conserved regions intervened by conserved and unconserved ones This is exactly the case in stretches of eukaryotic genomic sequences that encode one or more genes. In the final section we make some concluding remarks concerning this work

Syntenic global alignment
Application to the gene prediction problem
Experimental results
Comparison with previous approaches
Automatic parameter setting
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call