Abstract

We propose a method for predicting splice graphs that enhances curated gene models using evidence from RNA-Seq and EST alignments. Results obtained using RNA-Seq experiments in Arabidopsis thaliana show that predictions made by our SpliceGrapher method are more consistent with current gene models than predictions made by TAU and Cufflinks. Furthermore, analysis of plant and human data indicates that the machine learning approach used by SpliceGrapher is useful for discriminating between real and spurious splice sites, and can improve the reliability of detection of alternative splicing. SpliceGrapher is available for download at http://SpliceGrapher.sf.net.

Highlights

  • Deep transcriptome sequencing (RNA-Seq) with nextgeneration sequencing (NGS) technologies is providing unprecedented opportunities for researchers to probe the transcriptomes of many species [1,2,3,4,5]

  • SpliceGrapher’s splice graph prediction pipeline consists of the following steps (Figure 2): ungapped alignment of short reads to the reference genome, spliced alignment of reads that did not align in the first step, initial splice graph construction from the annotated gene models, assembly of exons from the ungapped short-read alignments, and insertion of the new exons into the splice graph using spliced alignments

  • SpliceGrapher accepts as input expressed sequence tag (EST) alignments as well; these are interpreted as splice graphs that SpliceGrapher merges with its gene model baseline graphs

Read more

Summary

Introduction

Deep transcriptome sequencing (RNA-Seq) with nextgeneration sequencing (NGS) technologies is providing unprecedented opportunities for researchers to probe the transcriptomes of many species [1,2,3,4,5]. It is inexpensive and easy to obtain whole transcriptome data using RNA-Seq, one limitation has been the lack of versatile methods to analyze these data. There is an increasing demand for methods that can use the short reads produced in these studies to predict patterns of AS. NGS base-call error rates tend to increase with read length, raising the chance of a mismatch when aligning a read to a reference sequence [8]. These ambiguities are exacerbated by the presence of paralogous genes that can give rise to

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call