Abstract

RNA sequencing (RNA-seq) enables characterization and quantification of individual transcriptomes as well as detection of patterns of allelic expression and alternative splicing. Current RNA-seq protocols depend on high-throughput short-read sequencing of cDNA. However, as ongoing advances are rapidly yielding increasing read lengths, a technical hurdle remains in identifying the degree to which differences in read length influence various transcriptome analyses. In this study, we generated two paired-end RNA-seq datasets of differing read lengths (2×75 bp and 2×262 bp) for lymphoblastoid cell line GM12878 and compared the effect of read length on transcriptome analyses, including read-mapping performance, gene and transcript quantification, and detection of allele-specific expression (ASE) and allele-specific alternative splicing (ASAS) patterns. Our results indicate that, while the current long-read protocol is considerably more expensive than short-read sequencing, there are important benefits that can only be achieved with longer read length, including lower mapping bias and reduced ambiguity in assigning reads to genomic elements, such as mRNA transcript. We show that these benefits ultimately lead to improved detection of cis-acting regulatory and splicing variation effects within individuals.

Highlights

  • The application of next-generation sequencing (NGS) to RNA has provided more complete means to annotate and quantify transcriptomes

  • While it would be ideal to sequence the full length of each mRNA molecule, current NGS technologies are limited to analyzing short fragments of cDNA, and only a limited number of bases can be read from each fragment with reasonable accuracy

  • Read alignment Based on the observation that many read pairs in line GM12878 with 262 bp reads (L262) come from cDNA fragments shorter than the read length (262 bp) (Figure S1), we identified and merged such pairs by searching for a 13 bp adapter sequence, which immediately follows the fragment sequence when the fragment is shorter than the read length

Read more

Summary

Introduction

The application of next-generation sequencing (NGS) to RNA has provided more complete means to annotate and quantify transcriptomes It has improved the characterization of many aspects of RNA biology including the detection of transcription start sites [1,2,3], allele-specific expression [4], alternative splicing events [5], fusion transcripts [6], RNA-editing [7], and antisense transcription [8]. We introduce a different viewpoint by focusing on quantification of genomic elements and detection of allele-specific patterns To this end, we generated paired-end RNA-sequencing datasets for the lymphoblastoid cell line GM12878 with 262 bp reads (L262) and 75 bp reads (L75) and carried out a comparative analysis

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call