Abstract

BackgroundThe initial next-generation sequencing technologies produced reads of 25 or 36 bp, and only from a single-end of the library sequence. Currently, it is possible to reliably produce 300 bp paired-end sequences for RNA expression analysis. While read lengths have consistently increased, people have assumed that longer reads are more informative and that paired-end reads produce better results than single-end reads. We used paired-end 101 bp reads and trimmed them to simulate different read lengths, and also separated the pairs to produce single-end reads. For each read length and paired status, we evaluated differential expression levels between two standard samples and compared the results to those obtained by qPCR.ResultsWe found that, with the exception of 25 bp reads, there is little difference for the detection of differential expression regardless of the read length. Once single-end reads are at a length of 50 bp, the results do not change substantially for any level up to, and including, 100 bp paired-end. However, splice junction detection significantly improves as the read length increases with 100 bp paired-end showing the best performance. We performed the same analysis on two ENCODE samples and found consistent results confirming that our conclusions have broad application.ConclusionsA researcher could save substantial resources by using 50 bp single-end reads for differential expression analysis instead of using longer reads. However, splicing detection is unquestionably improved by paired-end and longer reads. Therefore, an appropriate read length should be used based on the final goal of the study.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-015-0697-y) contains supplementary material, which is available to authorized users.

Highlights

  • The initial next-generation sequencing technologies produced reads of 25 or 36 bp, and only from a single-end of the library sequence

  • While the determination of the proper read length for an experiment is important across all sequencing experiments, including genome re-sequencing, de novo sequencing, RNA-seq, and ChIP-seq, we have only focused on the use of RNA-seq for differentially expressed genes (DEGs) and isoform detection

  • We have used data from the SEQC Sequencing study to investigate the effects of read-length on RNA-seq results and validated the results using data from the ENCODE consortium

Read more

Summary

Introduction

The initial next-generation sequencing technologies produced reads of 25 or 36 bp, and only from a single-end of the library sequence. For each read length and paired status, we evaluated differential expression levels between two standard samples and compared the results to those obtained by qPCR. The initial reads on Illumina and other next-generation platforms were extremely short and often only ranged up to 25 or 36 bp [1]. While these reads were sufficient for some assays, a substantial percentage of the reads could not be mapped uniquely and were often discarded due to the inability to determine their correct matching location within the genome [2]. The current read length that is standard for many experiments is paired-end 100 bp reads and there is the possibility of running paired-end 300 bp reads

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call