Spaced Seeds for Cross-species CDNA-to-genome Sequence Alignment

Liliana Florea,Ingrid Mihai,Leming Zhou

doi:10.4310/cis.2010.v10.n2.a4

Abstract

We review recent developments in spaced seed design for cross-species sequence alignment. We start with a brief overview of original ideas and early techniques, and then focus on more recent work on finding accurate (sensitive and specific) seeds for cross-species cDNA-to-genome alignment. These recent developments include methods and models for estimating seed specificity and determining sensitive and specific seeds, finding seeds that can be applied to a wide range of comparisons, and applying seed models to other computational biology areas, such as gene finding. 1. Introduction. New high-throughput and cost-effective technologies have rev- olutionized our ability to sequence complex organisms, and are expected to lead to a significant increase in the number of available genomes for species from all branches of life (1). The first and most important step in analyzing these genomes is gene annotation, that is, accurately identifying the locations and exon-intron structures of genes along the genome, and further determining their function. There are two primary classes of methods for identifying genes in a given genomic sequence. The first class, ab initio methods (GenScan (2), Genie (3), GeneMark (4), FGenesH (5)), use machine-learning techniques to analyze a single genomic sequence and predict the locations of genes. Such methods are reasonably accurate at finding coding exons, but are not effective at detecting untranslated regions (UTRs) and alternatively spliced or overlapping genes (6). The second class, comparative methods, predict exons based on sequence similarity of protein or expressed DNA (cDNA, EST, mRNA) with genomic sequences containing those genes. These methods are the most reliable for inferring the gene structure, and thus genome annotation projects have routinely used cDNA sequences from the same species to annotate genes. Although several projects exist that produce full- length cDNA sequences (7-9), they focus on a handful of high-priority species, such as human, mouse, rat, cow and zebrafish. For most newly sequenced species, few native cDNA sequences are available in the databases. Consequently, gene annotation

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Spaced Seeds for Cross-species CDNA-to-genome Sequence Alignment

Abstract

Talk to us

Similar Papers

More From: Communications in Information and Systems

Lead the way for us

Journal: Communications in Information and Systems	Publication Date: Jan 1, 2010
Citations: 42

Similar Papers

IEEE 7<sup>th</sup> BIBE Research Tutorial Lecture: Decoding Novel Genomes: From Microbiomes to the Eukaryota
Mark Borodovsky
-
Mark BorodovskyMark Borodovsky
01 Oct 2007
01 Oct 2007

ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles
Thomas Abeel ... Yves Van De Peer
Bioinformatics | VOL. 24
Thomas Abeel, et. al.Thomas Abeel ... Yves Van De Peer
01 Jul 2008
Bioinformatics | VOL. 24

Functional Analysis of Intergenic Regions for Gene Discovery
Li M.
-
Li M.Li M.
02 Sep 2011
02 Sep 2011

A Hybrid Multiobjective Memetic Metaheuristic for Multiple Sequence Alignment
Alvaro Rubio-Largo ... David L Gonzalez-Alvarez
IEEE Transactions on Evolutionary Computation | VOL. 20
Alvaro Rubio-Largo, et. al.Alvaro Rubio-Largo ... David L Gonzalez-Alvarez
01 Aug 2016
IEEE Transactions on Evolutionary Computation | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Spaced Seeds for Cross-species CDNA-to-genome Sequence Alignment

Abstract

Talk to us

Similar Papers

More From: Communications in Information and Systems