Abstract

The direct RNA sequencing platform offered by Oxford Nanopore Technologies allows for direct measurement of RNA molecules without the need of conversion to complementary DNA, fragmentation or amplification. As such, it is virtually capable of detecting any given RNA modification present in the molecule that is being sequenced, as well as provide polyA tail length estimations at the level of individual RNA molecules. Although this technology has been publicly available since 2017, the complexity of the raw Nanopore data, together with the lack of systematic and reproducible pipelines, have greatly hindered the access of this technology to the general user. Here we address this problem by providing a fully benchmarked workflow for the analysis of direct RNA sequencing reads, termed MasterOfPores. The pipeline starts with a pre-processing module, which converts raw current intensities into multiple types of processed data including FASTQ and BAM, providing metrics of the quality of the run, quality-filtering, demultiplexing, base-calling and mapping. In a second step, the pipeline performs downstream analyses of the mapped reads, including prediction of RNA modifications and estimation of polyA tail lengths. Four direct RNA MinION sequencing runs can be fully processed and analyzed in 10 h on 100 CPUs. The pipeline can also be executed in GPU locally or in the cloud, decreasing the run time fourfold. The software is written using the NextFlow framework for parallelization and portability, and relies on Linux containers such as Docker and Singularity for achieving better reproducibility. The MasterOfPores workflow can be executed on any Unix-compatible OS on a computer, cluster or cloud without the need of installing any additional software or dependencies, and is freely available in Github (https://github.com/biocorecrg/master_of_pores). This workflow simplifies direct RNA sequencing data analyses, facilitating the study of the (epi)transcriptome at single molecule resolution.

Highlights

  • Generation sequencing (NGS) technologies have revolutionized our understanding of the cell and its biology

  • We provide a scalable and parallelizable workflow for the analysis of direct RNA sequencing datasets, termed MasterOfPores,4 which uses as input raw direct RNA sequencing FAST5 reads, which is a flexible HDF5 format used by Oxford Nanopore Technologies (ONT) to store raw sequencing data, which includes current intensity values, metadata of the sequencing run and base-called fasta sequences, among other features

  • We chose the workflow framework NextFlow (Di Tommaso et al, 2017) because of its native support of different batch schedulers (SGE, LSF, SLURM, PBS, and HTCondor), cloud platforms (Kubernetes, Amazon Amazon Web Services (AWS), and Google Cloud) and graphic processing units (GPUs) computing, which is crucial for processing huge volumes of data produced by nanopore sequencers

Read more

Summary

INTRODUCTION

Generation sequencing (NGS) technologies have revolutionized our understanding of the cell and its biology. In the past few years, ONT technology has revolutionized the fields of genomics and (epi)transcriptomics, by showing its wide range of applications in genome assembly (Jain et al, 2018), study of structural variations within genomes (Cretu Stancu et al, 2017), 3 poly(A) tail length estimation (Krause et al, 2019; Workman et al, 2019), accurate transcriptome profiling (Bolisetty et al, 2015; Sessegolo et al, 2019), identification of novel isoforms (Byrne et al, 2017; Križanovic et al, 2018) and direct identification of DNA and RNA modifications (Carlsen et al, 2014; Simpson et al, 2017; Garalde et al, 2018; Leger et al, 2019; Liu et al, 2019; Parker et al, 2020) This technology overcomes many of the limitations of short-read sequencing, but importantly, it can directly measure RNA and DNA modifications in their native molecules. We expect that our workflow will greatly facilitate the access of Nanopore direct RNA sequencing to the community

RESULTS
DISCUSSION
Code Availability
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call