PASTA: splice junction identification from RNA-Sequencing data

Shaojun Tang,Alberto Riva

doi:10.1186/1471-2105-14-116

Abstract

BackgroundNext generation transcriptome sequencing (RNA-Seq) is emerging as a powerful experimental tool for the study of alternative splicing and its regulation, but requires ad-hoc analysis methods and tools. PASTA (Patterned Alignments for Splicing and Transcriptome Analysis) is a splice junction detection algorithm specifically designed for RNA-Seq data, relying on a highly accurate alignment strategy and on a combination of heuristic and statistical methods to identify exon-intron junctions with high accuracy.ResultsComparisons against TopHat and other splice junction prediction software on real and simulated datasets show that PASTA exhibits high specificity and sensitivity, especially at lower coverage levels. Moreover, PASTA is highly configurable and flexible, and can therefore be applied in a wide range of analysis scenarios: it is able to handle both single-end and paired-end reads, it does not rely on the presence of canonical splicing signals, and it uses organism-specific regression models to accurately identify junctions.ConclusionsPASTA is a highly efficient and sensitive tool to identify splicing junctions from RNA-Seq data. Compared to similar programs, it has the ability to identify a higher number of real splicing junctions, and provides highly annotated output files containing detailed information about their location and characteristics. Accurate junction data in turn facilitates the reconstruction of the splicing isoforms and the analysis of their expression levels, which will be performed by the remaining modules of the PASTA pipeline, still under development. Use of PASTA can therefore enable the large-scale investigation of transcription and alternative splicing.

Highlights

Generation transcriptome sequencing (RNA-Seq) is emerging as a powerful experimental tool for the study of alternative splicing and its regulation, but requires ad-hoc analysis methods and tools
A logistic regression model for splice junction prediction Because of the uncertainty involved in identifying the precise location of splice junctions from short RNA-Seq reads, PASTA employs a logistic regression model to assign a score to each putative intron produced by a pair of junctions
We generated four simulated datasets of 50nt single-ended RNA-Seq reads from mouse transcripts appearing in ENSEMBL gene annotations, corresponding to average depths of coverage ranging from 1 to 8 reads per nucleotide, and we introduced random sequencing errors at a frequency of 1/1000 basepairs and Single Nucleotide Polymorphism (SNP) at a frequency of 5/1000 basepairs

Summary

Results

Comparisons against TopHat and other splice junction prediction software on real and simulated datasets show that PASTA exhibits high specificity and sensitivity, especially at lower coverage levels. PASTA is highly configurable and flexible, and can be applied in a wide range of analysis scenarios: it is able to handle both single-end and paired-end reads, it does not rely on the presence of canonical splicing signals, and it uses organism-specific regression models to accurately identify junctions

Conclusions

Background

Results and discussion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Apr 4, 2013
Citations: 29	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

PASTA: splice junction identification from RNA-Sequencing data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

FOXA1 regulates alternative splicing in prostate cancer.
Marco Del Giudice ... Matteo Cereda
Cell reports | VOL. 40
Marco Del Giudice, et. al.Marco Del Giudice ... Matteo Cereda
01 Sep 2022
Cell reports | VOL. 40

Bipartite functions of the CREB co-activators selectively direct alternative splicing or transcriptional activation
Antonio L Amelio ... Michael D Conkright
The EMBO Journal | VOL. 28
Antonio L Amelio, et. al.Antonio L Amelio ... Michael D Conkright
30 Jul 2009
The EMBO Journal | VOL. 28

Hybrid-denovo: a de novo OTU-picking pipeline integrating single-end and paired-end 16S sequence tags.
Xianfeng Chen ... Patricio Jeraldo
GigaScience | VOL. 7
Xianfeng Chen, et. al.Xianfeng Chen ... Patricio Jeraldo
15 Dec 2017
GigaScience | VOL. 7

RNA Splicing Analysis: From In Vitro Testing to Single-Cell Imaging
Xiaojun Ren ... Jinghong Li
Chem | VOL. 5
Xiaojun Ren, et. al.Xiaojun Ren ... Jinghong Li
20 Jun 2019
Chem | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PASTA: splice junction identification from RNA-Sequencing data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics