Abstract

Next generation sequencing has made it possible to perform differential gene expression studies in non-model organisms. For these studies, the need for a reference genome is circumvented by performing de novo assembly on the RNA-seq data. However, transcriptome assembly produces a multitude of contigs, which must be clustered into genes prior to differential gene expression detection. Here we present Corset, a method that hierarchically clusters contigs using shared reads and expression, then summarizes read counts to clusters, ready for statistical testing. Using a range of metrics, we demonstrate that Corset out-performs alternative methods. Corset is available from https://code.google.com/p/corset-project/.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-014-0410-6) contains supplementary material, which is available to authorized users.

Highlights

  • Next-generation sequencing of RNA, RNA-seq, is a powerful technology for studying various aspects of the transcriptome; it has a broad range of applications, including gene discovery, detection of alternative splicing events, differential expression analysis, fusion detection and identification of variants such as single-nucleotide polymorphism (SNP) and posttranscriptional editing [1,2]

  • Corset clusters contigs and counts reads The first step in performing a gene-level differential expression analysis for a non-model organism is to assemble the contigs, which can be performed using a variety of software

  • The step is to group, or cluster, the contigs into genes to facilitate downstream differential expression analysis

Read more

Summary

Introduction

Next-generation sequencing of RNA, RNA-seq, is a powerful technology for studying various aspects of the transcriptome; it has a broad range of applications, including gene discovery, detection of alternative splicing events, differential expression analysis, fusion detection and identification of variants such as SNPs and posttranscriptional editing [1,2]. When no reference genome is available, the transcriptome is de novo assembled directly from RNA-seq reads [3]. Several programs exist for de novo transcriptome assembly: Oases [4] and Trans-abyss [5], which extend the Velvet [6] and Abyss [7] genomic assemblers, respectively, as well as purpose built transcriptome assemblers such as Trinity [8]. These programs are capable of assembling millions of short reads into transcript sequences - called contigs. Performing a differential expression analysis on a de novo assembled transcriptome is challenging because multiple

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call