How deep is deep enough for RNA-Seq profiling of bacterial transcriptomes?

Brian J Haas,Bruce W Birren,Jonathan Livny,Melissa Chin,Chad Nusbaum

doi:10.1186/1471-2164-13-734

Brian J Haas, Bruce W Birren + Show 3 more

Open Access

https://doi.org/10.1186/1471-2164-13-734

Copy DOI

Abstract

BackgroundHigh-throughput sequencing of cDNA libraries (RNA-Seq) has proven to be a highly effective approach for studying bacterial transcriptomes. A central challenge in designing RNA-Seq-based experiments is estimating a priori the number of reads per sample needed to detect and quantify thousands of individual transcripts with a large dynamic range of abundance.ResultsWe have conducted a systematic examination of how changes in the number of RNA-Seq reads per sample influences both profiling of a single bacterial transcriptome and the comparison of gene expression among samples. Our findings suggest that the number of reads typically produced in a single lane of the Illumina HiSeq sequencer far exceeds the number needed to saturate the annotated transcriptomes of diverse bacteria growing in monoculture. Moreover, as sequencing depth increases, so too does the detection of cDNAs that likely correspond to spurious transcripts or genomic DNA contamination. Finally, even when dozens of barcoded individual cDNA libraries are sequenced in a single lane, the vast majority of transcripts in each sample can be detected and numerous genes differentially expressed between samples can be identified.ConclusionsOur analysis provides a guide for the many researchers seeking to determine the appropriate sequencing depth for RNA-Seq-based studies of diverse bacterial species.

Highlights

High-throughput sequencing of Complementary DNA synthesized from RNA (cDNA) libraries (RNA-Seq) has proven to be a highly effective approach for studying bacterial transcriptomes
This is often achieved by targeted removal of ribosomal RNA, which comprises 80-95% of bacterial transcriptomes, from total RNA prior to cDNA library construction [14,15]
Ultra-deep sequencing of the E. coli transcriptome Previous studies have suggested that accurate quantification of > 95% of transcripts in a mammalian cell line requires ~700 million reads [17]; no estimate of the number of reads needed to approach saturation of a bacterial transcriptome has been reported

Summary

Introduction

High-throughput sequencing of cDNA libraries (RNA-Seq) has proven to be a highly effective approach for studying bacterial transcriptomes. High throughput sequencing of cDNA libraries (RNA-Seq) has emerged as a powerful technology for profiling gene expression, discovering previously unannotated genes, and mapping transcriptome architecture in a wide variety of bacterial species [1,2,3,4,5,6,7,8,9,10,11]. In order to generate comprehensive transcriptome profiles using RNA-Seq one must obtain a sufficiently large number of reads to detect those biologically relevant transcripts that comprise a relatively small proportion of the cDNA library. The proportion of reads representing rare transcripts can be increased by depleting abundant transcripts from total RNA and/or depleting cDNAs representing these abundant transcripts from cDNA libraries. This is often achieved by targeted removal of ribosomal RNA (rRNA), which comprises 80-95% of bacterial transcriptomes, from total RNA prior to cDNA library construction [14,15]

Methods

Results

Conclusion