Abstract

AbstractRNA-Seq is a powerful tool for the study of alternative splicing and other forms of alternative isoform expression. Understanding the regulation of these processes requires sensitive and specific detection of differential isoform abundance in comparisons between conditions, cell types or tissues. We present DEXSeq, a statistical method to test for differential exon usage in RNA-Seq data. DEXSeq employs generalized linear models and offers reliable control of false discoveries by taking biological variation into account. DEXSeq detects genes, and in many cases specific exons, that are subject to differential exon usage with high sensitivity. We demonstrate the versatility of DEXSeq by applying it to several data sets. The method facilitates the study of regulation and function of alternative exon usage on a genome-wide scale. An implementation of DEXSeq is available as an R/Bioconductor package.This preprint has subsequently been published in Genome Research (doi:10.1101/gr.133744.111)

Highlights

  • IntroductionA single gene can give rise to a multitude of different transcripts (isoforms) by varying the usage of splice sites, transcription start sites and polyadenylation sites

  • In higher eukaryotes, a single gene can give rise to a multitude of different transcripts by varying the usage of splice sites, transcription start sites and polyadenylation sites

  • In the Discussion, we elaborate on the observation that most published methods are unable to account for biological variation, focusing on the analysis provided by Brooks et al (2010) for their data (which is based on the method of Wang et al (2008)), and illustrate how this leads to unreliable results

Read more

Summary

Introduction

A single gene can give rise to a multitude of different transcripts (isoforms) by varying the usage of splice sites, transcription start sites and polyadenylation sites. High-throughput sequencing of mRNA (RNA-Seq) promises to become an important technique for the study of alternative isoform regulation, especially in comparisons between different tissues or cell types, or between cells in different environmental conditions or with different genetic backgrounds. Shotgun sequencing The median length of human transcripts is 2186 nucleotides (nt), with the longest transcripts having sizes of up to 101206 nt (these numbers are based on UCSC hg). An ideal RNA-Seq technology would produce sequence reads that directly correspond to full length transcripts. Current implementations of RNA-Seq, employ shorter reads and use a shotgun sequencing approach. Illumina’s HiSeq 2000 produces reads of length 101 nt, which are typically paired so that they cover the two ends of shotgun fragments of lengths between 200 and 500 nt

Objectives
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call