Abstract

The RNA-seq paired-end read (PER) protocol samples transcript fragments longer than the sequencing capability of today's technology by sequencing just the two ends of each fragment. Deep sampling of the transcriptome using the PER protocol presents the opportunity to reconstruct the unsequenced portion of each transcript fragment using end reads from overlapping PERs, guided by the expected length of the fragment. A probabilistic framework is described to predict the alignment to the genome of all PER transcript fragments in a PER dataset. Starting from possible exonic and spliced alignments of all end reads, our method constructs potential splicing paths connecting paired ends. An expectation maximization method assigns likelihood values to all splice junctions and assigns the most probable alignment for each transcript fragment. The method was applied to 2 x 35 bp PER datasets from cancer cell lines MCF-7 and SUM-102. PER fragment alignment increased the coverage 3-fold compared to the alignment of the end reads alone, and increased the accuracy of splice detection. The accuracy of the expectation maximization (EM) algorithm in the presence of alternative paths in the splice graph was validated by qRT-PCR experiments on eight exon skipping alternative splicing events. PER fragment alignment with long-range splicing confirmed 8 out of 10 fusion events identified in the MCF-7 cell line in an earlier study by (Maher et al., 2009). Software available at http://www.netlab.uky.edu/p/bioinfo/MapSplice/PER.

Highlights

  • High-throughput sequencing technologies are providing unprecedented visibility into the mRNA transcriptome of a cell

  • We propose a probabilistic framework to predict the alignment of each transcript fragment to a reference genome

  • The alignment chosen is determined by maximizing the likelihood of all paired-end read (PER) alignments through an expectation maximization method

Read more

Summary

Introduction

High-throughput sequencing technologies are providing unprecedented visibility into the mRNA transcriptome of a cell. Alternative splicing and gene fusion events (Berger et al, 2010; Maher et al, 2009) are common changes observed in the mRNA transcriptome. Several computational methods (Au et al, 2010; Trapnell et al, 2009) have been developed to identify splicing events using RNA-seq data. New protocols and sequencing methods have expanded the length and type of RNA-seq reads, enabling more accurate characterization of the splices present in the transcriptome. The paired-end read (PER) protocol sequences two ends of a size-selected fragment of an mRNA transcript and reports the results as a pair. For example, the expected size of mRNA fragments are around 182 bp (±40 bp).. For example, the expected size of mRNA fragments are around 182 bp (±40 bp).1 Both ends of the fragment are sequenced to at least 35 bp in length For example, the expected size of mRNA fragments are around 182 bp (±40 bp). Both ends of the fragment are sequenced to at least 35 bp in length

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call