Abstract

BackgroundDeep transcriptome analysis will underpin a large fraction of post-genomic biology. 'Closed' technologies, such as microarray analysis, only detect the set of transcripts chosen for analysis, whereas 'open' e.g. tag-based technologies are capable of identifying all possible transcripts, including those that were previously uncharacterized. Although new technologies are now emerging, at present the major resources for open-type analysis are the many publicly available SAGE (serial analysis of gene expression) and MPSS (massively parallel signature sequencing) libraries. These technologies have never been compared for their utility in the context of deep transcriptome mining.ResultsWe used a single LongSAGE library of 503,431 tags and a "classic" MPSS library of 1,744,173 tags, both prepared from the same T cell-derived RNA sample, to compare the ability of each method to probe, at considerable depth, a human cellular transcriptome. We show that even though LongSAGE is more error-prone than MPSS, our LongSAGE library nevertheless generated 6.3-fold more genome-matching (and therefore likely error-free) tags than the MPSS library. An analysis of a set of 8,132 known genes detectable by both methods, and for which there is no ambiguity about tag matching, shows that MPSS detects only half (54%) the number of transcripts identified by SAGE (3,617 versus 1,955). Analysis of two additional MPSS libraries shows that each library samples a different subset of transcripts, and that in combination the three MPSS libraries (4,274,992 tags in total) still only detect 73% of the genes identified in our test set using SAGE. The fraction of transcripts detected by MPSS is likely to be even lower for uncharacterized transcripts, which tend to be more weakly expressed. The source of the loss of complexity in MPSS libraries compared to SAGE is unclear, but its effects become more severe with each sequencing cycle (i.e. as MPSS tag length increases).ConclusionWe show that MPSS libraries are significantly less complex than much smaller SAGE libraries, revealing a serious bias in the generation of MPSS data unlikely to have been circumvented by later technological improvements. Our results emphasize the need for the rigorous testing of new expression profiling technologies.

Highlights

  • Deep transcriptome analysis will underpin a large fraction of post-genomic biology

  • The numbers for DpnII sites are 7,112,355 and 253,936, so this site is rarer, suggesting that the ability of massively parallel signature sequencing (MPSS) to tag more transcripts is in this way compromised

  • In RefSeq, excluding predicted transcripts from the genome, the proportion of cDNAs lacking the LongSAGE recognition site is less than 0.6% (144/24,261) whereas the proportion lacking the MPSS site is substantially higher, at ~2.3% (552)

Read more

Summary

Introduction

Deep transcriptome analysis will underpin a large fraction of post-genomic biology. 'Closed' technologies, such as microarray analysis, only detect the set of transcripts chosen for analysis, whereas 'open' e.g. tag-based technologies are capable of identifying all possible transcripts, including those that were previously uncharacterized. New technologies are emerging, at present the major resources for open-type analysis are the many publicly available SAGE (serial analysis of gene expression) and MPSS (massively parallel signature sequencing) libraries. These technologies have never been compared for their utility in the context of deep transcriptome mining. 'Closed' architecture systems, such as microarrays, are less suited to this application because they are limited by the extent to which global transcriptome coverage has been achieved Even in organisms such as Homo sapiens where a complete genome sequence is available, there remains uncertainty regarding the actual number of transcribed regions. Be some time before all human genes can be confidently sampled in a conventional laboratory setting using such methodologies

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call