PIntron: a fast method for detecting the gene structure due to alternative splicing via maximal pairings of a pattern and a text

Yuri Pirola,Graziano Pesole,Raffaella Rizzi,Gianluca Della Vedova,Paola Bonizzoni,Ernesto Picardi

doi:10.1186/1471-2105-13-s5-s2

Abstract

BackgroundA challenging issue in designing computational methods for predicting the gene structure into exons and introns from a cluster of transcript (EST, mRNA) sequences, is guaranteeing accuracy as well as efficiency in time and space, when large clusters of more than 20,000 ESTs and genes longer than 1 Mb are processed. Traditionally, the problem has been faced by combining different tools, not specifically designed for this task.ResultsWe propose a fast method based on ad hoc procedures for solving the problem. Our method combines two ideas: a novel algorithm of proved small time complexity for computing spliced alignments of a transcript against a genome, and an efficient algorithm that exploits the inherent redundancy of information in a cluster of transcripts to select, among all possible factorizations of EST sequences, those allowing to infer splice site junctions that are largely confirmed by the input data. The EST alignment procedure is based on the construction of maximal embeddings, that are sequences obtained from paths of a graph structure, called embedding graph, whose vertices are the maximal pairings of a genomic sequence T and an EST P. The procedure runs in time linear in the length of P and T and in the size of the output.The method was implemented into the PIntron package. PIntron requires as input a genomic sequence or region and a set of EST and/or mRNA sequences. Besides the prediction of the full-length transcript isoforms potentially expressed by the gene, the PIntron package includes a module for the CDS annotation of the predicted transcripts.ConclusionsPIntron, the software tool implementing our methodology, is available at http://www.algolab.eu/PIntron under GNU AGPL. PIntron has been shown to outperform state-of-the-art methods, and to quickly process some critical genes. At the same time, PIntron exhibits high accuracy (sensitivity and specificity) when benchmarked with ENCODE annotations.

Highlights

A challenging issue in designing computational methods for predicting the gene structure into exons and introns from a cluster of transcript (EST, mRNA) sequences, is guaranteeing accuracy as well as efficiency in time and space, when large clusters of more than 20,000 ESTs and genes longer than 1 Mb are processed
The minimization criteria is used to avoid overprediction of splice junctions. For this task we propose a formalization of the problem of finding a putative gene structure, called CONSENSUS GENE STRUCTURE problem (CG) and discuss a solution of this problem
PIntron outputs the list of the predicted introns with information such as relative and absolute start and end positions, intron lengths, the donor and the acceptor splice sites, and intron types (U12, U2 or unclassified)

Summary

Introduction

A challenging issue in designing computational methods for predicting the gene structure into exons and introns from a cluster of transcript (EST, mRNA) sequences, is guaranteeing accuracy as well as efficiency in time and space, when large clusters of more than 20,000 ESTs and genes longer than 1 Mb are processed. Some tools related to the problem, but limited to the specific task of predicting splice junctions from Next-Generation Sequencing (NGS) data, have been designed [10,11,12,13] These tools are computationally intensive and would require a post-processing step to filter the correct data that can be related to the alternative exon-intron structure of a gene. The literature provides efficient solutions for computing a specific spliced alignment of an EST against the genome (for example Exonerate [14], GMAP [15] and Spaln [16]) These tools are designed to compute only spliced alignments and not to directly provide the complete exon-intron structure of a gene and its full-length isoforms

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Apr 12, 2012
Citations: 33	License type: cc-by

R Discovery Prime

R Discovery Prime

PIntron: a fast method for detecting the gene structure due to alternative splicing via maximal pairings of a pattern and a text

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

PIntron: A fast method for gene structure prediction via maximal pairings of a pattern and a text
Paola Bonizzoni ... Gianluca Della Vedova
-
Paola Bonizzoni, et. al.Paola Bonizzoni ... Gianluca Della Vedova
01 Feb 2011
01 Feb 2011

An interactive bovine in silico SNP database (IBISS)
Rachel J Hawken ... Sean M Mcwilliam
Mammalian Genome | VOL. 15
Rachel J Hawken, et. al.Rachel J Hawken ... Sean M Mcwilliam
01 Oct 2004
Mammalian Genome | VOL. 15

Uniform Storage Model-based Update Scheme of On-line Information Retrieval System
Xiaozhu Liu ... Yuanhua He
Journal of Networks | VOL. 8
Xiaozhu Liu, et. al.Xiaozhu Liu ... Yuanhua He
10 Sep 2013
Journal of Networks | VOL. 8

Data, Data Everywhere …
Richard Glynne
Cell | VOL. 101
Richard GlynneRichard Glynne
01 Apr 2000
Cell | VOL. 101

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PIntron: a fast method for detecting the gene structure due to alternative splicing via maximal pairings of a pattern and a text

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics