Transcriptome profiling of mouse samples using nanopore sequencing of cDNA and RNA molecules

Camille Sessegolo,Corinne Cruaud,Thomas Derrien,Audric Cologne,Jean-Marc Aury,Corinne Da Silva,Marion Dubarry,Vincent Lacroix

doi:10.1038/s41598-019-51470-9

Abstract

Our vision of DNA transcription and splicing has changed dramatically with the introduction of short-read sequencing. These high-throughput sequencing technologies promised to unravel the complexity of any transcriptome. Generally gene expression levels are well-captured using these technologies, but there are still remaining caveats due to the limited read length and the fact that RNA molecules had to be reverse transcribed before sequencing. Oxford Nanopore Technologies has recently launched a portable sequencer which offers the possibility of sequencing long reads and most importantly RNA molecules. Here we generated a full mouse transcriptome from brain and liver using the Oxford Nanopore device. As a comparison, we sequenced RNA (RNA-Seq) and cDNA (cDNA-Seq) molecules using both long and short reads technologies and tested the TeloPrime preparation kit, dedicated to the enrichment of full-length transcripts. Using spike-in data, we confirmed that expression levels are efficiently captured by cDNA-Seq using short reads. More importantly, Oxford Nanopore RNA-Seq tends to be more efficient, while cDNA-Seq appears to be more biased. We further show that the cDNA library preparation of the Nanopore protocol induces read truncation for transcripts containing internal runs of T’s. This bias is marked for runs of at least 15 T’s, but is already detectable for runs of at least 9 T’s and therefore concerns more than 20% of expressed transcripts in mouse brain and liver. Finally, we outline that bioinformatics challenges remain ahead for quantifying at the transcript level, especially when reads are not full-length. Accurate quantification of repeat-associated genes such as processed pseudogenes also remains difficult, and we show that current mapping protocols which map reads to the genome largely over-estimate their expression, at the expense of their parent gene.

Highlights

To date our knowledge of DNA transcription is brought by the sequencing of RNA molecules which have been first reverse transcribed (RT)
RNAs were sampled from brain and liver tissues of mice and were mixed with Lexogen’s Spike-In RNA Variants (SIRVs) as a control for quantification of RNAs
We computed the relative coverage for each transcript upstream and downstream internal runs of poly(A) or poly(T), and we find that using cDNA-Seq, www.nature.com/scientificreports

Summary

Introduction

To date our knowledge of DNA transcription is brought by the sequencing of RNA molecules which have been first reverse transcribed (RT) This RT step is prone to skew the transcriptional landscape of a given cell and erase base modifications. The Oxford Nanopore Technologies (ONT) company commercially released a portable sequencer which is able to sequence very long DNA fragments[3] and enable the sequencing of complex genomes[4,5,6]. This device (namely MinION) is able to sequence native RNA molecules[7] representing the first opportunity to generate genuine RNA-Seq data. Data produced with protocols based on oligo-dT or random primers in the RT step show differences in how they cover transcripts[8]

Methods

Results

Conclusion