High throughput error corrected Nanopore single cell transcriptome sequencing

Kevin Lebrigand,Virginie Magnone,Rainer Waldmann,Pascal Barbry

doi:10.1038/s41467-020-17800-6

Abstract

Droplet-based high throughput single cell sequencing techniques tremendously advanced our insight into cell-to-cell heterogeneity. However, those approaches only allow analysis of one extremity of the transcript after short read sequencing. In consequence, information on splicing and sequence heterogeneity is lost. To overcome this limitation, several approaches that use long-read sequencing were introduced recently. Yet, those techniques are limited by low sequencing depth and/or lacking or inaccurate assignment of unique molecular identifiers (UMIs), which are critical for elimination of PCR bias and artifacts. We introduce ScNaUmi-seq, an approach that combines the high throughput of Oxford Nanopore sequencing with an accurate cell barcode and UMI assignment strategy. UMI guided error correction allows to generate high accuracy full length sequence information with the 10x Genomics single cell isolation system at high sequencing depths. We analyzed transcript isoform diversity in embryonic mouse brain and show that ScNaUmi-seq allows defining splicing and SNVs (RNA editing) at a single cell level.

Highlights

Droplet-based high throughput single cell sequencing techniques tremendously advanced our insight into cell-to-cell heterogeneity
Amplification bias and chimeric cDNA generated during PCR amplification are issues that can be both addressed by unique molecular identifiers (UMIs), short random sequence tags that are introduced during reverse transcription
We addressed those issues and designed a long-read single-cell sequencing approach that combines the high throughput of Nanopore sequencing with high accuracy cell barcode (cellBC) and UMI assignment

Summary

Results and discussion

Assignment of cell barcodes and unique molecular identifiers to Nanopore reads. We prepared a 190 cell and a 951 cell E18 mouse brain library with the 10x Genomics Chromium system and generated 43 × 106 and 70 × 106 Illumina reads (Supplementary Fig. 1) as well as 32 × 106 and 322 × 106 Nanopore reads for the 190 and 951 cell replicates, respectively. We compared the cellBC sequence extracted from each genome aligned Nanopore read with the cell barcodes found in the Illumina data for the same gene or genomic region. Following this strategy, we assigned cellBCs to 68 ± 4% of Nanopore reads with identified poly(A) and adapter sequence (Fig. 1b; Supplementary Fig. 3a, c; see methods section for details). After assignment of the cellBC to the Nanopore read, we compared the Nanopore UMI read sequence with the UMI sequences found for the same gene (or genomic region) and the same cell in the Illumina sequencing data (see methods section for details). We anticipate its usefulness in many biological and medical applications, from cell biology and development to clinical analyses of tumor heterogeneity

Methods

Search for cellBCs

Search for UMIs

Maximal possible Barcode and UMI assignment efficiency with 10xGenomics data

Compatibility of the software

Code availability