Abstract

RNA sequencing using next-generation sequencing technologies (NGS) is currently the standard approach for gene expression profiling, particularly for large-scale high-throughput studies. NGS technologies comprise high throughput, cost efficient short-read RNA-Seq, while emerging single molecule, long-read RNA-Seq technologies have enabled new approaches to study the transcriptome and its function. The emerging single molecule, long-read technologies are currently commercially available by Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), while new methodologies based on short-read sequencing approaches are also being developed in order to provide long range single molecule level information—for example, the ones represented by the 10x Genomics linked read methodology. The shift toward long-read sequencing technologies for transcriptome characterization is based on current increases in throughput and decreases in cost, making these attractive for de novo transcriptome assembly, isoform expression quantification, and in-depth RNA species analysis. These types of analyses were challenging with standard short sequencing approaches, due to the complex nature of the transcriptome, which consists of variable lengths of transcripts and multiple alternatively spliced isoforms for most genes, as well as the high sequence similarity of highly abundant species of RNA, such as rRNAs. Here we aim to focus on single molecule level sequencing technologies and single-cell technologies that, combined with perturbation tools, allow the analysis of complete RNA species, whether short or long, at high resolution. In parallel, these tools have opened new ways in understanding gene functions at the tissue, network, and pathway levels, as well as their detailed functional characterization. Analysis of the epi-transcriptome, including RNA methylation and modification and the effects of such modifications on biological systems is now enabled through direct RNA sequencing instead of classical indirect approaches. However, many difficulties and challenges remain, such as methodologies to generate full-length RNA or cDNA libraries from all different species of RNAs, not only poly-A containing transcripts, and the identification of allele-specific transcripts due to current error rates of single molecule technologies, while the bioinformatics analysis on long-read data for accurate identification of 5′ and 3′ UTRs is still in development.

Highlights

  • RNA sequencing (RNA-Seq) using short-read sequencing technologies currently offered by Illumina or Thermo Fisher (Ion Torrent) represents the standard and widely used method for transcriptome profiling (Goodwin et al, 2016)

  • The long-read lengths achieved with this technology, coupled with the IsoSeq RNA sequencing protocol discussed below and downstream data analysis pipelines developed by Pacific Biosciences (PacBio) provides a powerful approach to RNA analysis

  • It is expected that in a PacBio read of a given length, if it has been produced from a short cDNA isoform, the sequence of this short isoform will be present on the PacBio circular-consensus sequence (CCS) read many more times than the sequence of a long cDNA isoform if the PacBio read has been produced from this long isoform

Read more

Summary

INTRODUCTION

RNA sequencing (RNA-Seq) using short-read sequencing technologies currently offered by Illumina or Thermo Fisher (Ion Torrent) represents the standard and widely used method for transcriptome profiling (Goodwin et al, 2016). Another sequencing technology from MGI (DNBSEQ), which is based on the formation of DNA nanoballs (Huang et al, 2017), has been used for RNA-seq studies and has shown a comparable performance in terms of quantification of gene expression and technical variability to the Illumina platform (Jeon et al, 2019; Natarajan et al, 2019). Pioneered by the PacBio “Iso-Seq” method, this approach involves mainly the characterization of the different isoform models by sequencing groups of cDNA reads after fractionating them based on their length (Au et al, 2013). We will present the library preparation methods that are exploiting these properties to enrich for the different categories

Properties of Long RNA Molecules and RNA Fragments
Strategies for cDNA Synthesis Using RNA Molecules and RNA Fragments
PacBio Platform Sequencing Loading Overview
Analysis of the PacBio Sequenced Reads
PacBio Sequel Performance
Nanopore Sequencing Platform Overview
Nanopore Platform Library Preparation Overview
Nanopore Data Analysis
DIRECT RNA SEQUENCING METHODOLOGY
Direct RNA Sequencing Library Preparation
Findings
FUTURE DIRECTIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call