Abstract

DNA and RNA modifications can now be identified using nanopore sequencing. However, we currently lack a flexible software to efficiently encode, store, analyze and visualize DNA and RNA modification data. Here, we present ModPhred, a versatile toolkit that facilitates DNA and RNA modification analysis from nanopore sequencing reads in a user-friendly manner. ModPhred integrates probabilistic DNA and RNA modification information within the FASTQ and BAM file formats, can be used to encode multiple types of modifications simultaneously, and its output can be easily coupled to genomic track viewers, facilitating the visualization and analysis of DNA and RNA modification information in individual reads in a simple and computationally efficient manner. ModPhred is available at https://github.com/novoalab/modPhred, is implemented in Python3, and is released under an MIT license. Docker images with all dependencies preinstalled are also provided. Supplementary data are available at Bioinformatics online.

Highlights

  • Third generation sequencing technologies have revolutionized our ability to identify base modifications in single molecules (Garalde et al, 2018; Kelleher et al, 2018; Liu et al, 2019b; Loman et al, 2015; Novoa et al, 2017)

  • ModPhred integrates probabilistic DNA and RNA modification information within the FASTQ and BAM file formats, can be used to encode multiple types of modifications simultaneously, and its output can be coupled to genomic track viewers, facilitating the visualization and analysis of DNA and RNA modification information in individual reads in a simple and computationally efficient manner

  • The only available algorithm to extract and store DNA or RNA modification information from basecalled FAST5 datasets is megalodon, a tool developed by Oxford Nanopore Technologies (ONT) that relies on a previously trained basecalling model to extract methylation information from each raw Fast5 read, which is dumped into a plain text file that will contain all predicted modified sites

Read more

Summary

Introduction

Third generation sequencing technologies have revolutionized our ability to identify base modifications in single molecules (Garalde et al, 2018; Kelleher et al, 2018; Liu et al, 2019b; Loman et al, 2015; Novoa et al, 2017). Megalodon presents several caveats and limitations: (i) it only supports m5C and m6A DNA modification detection, (ii) it cannot be used with direct RNA sequencing datasets that are mapped to the genome, (iii) it does not integrate modification information within the FastQ format, (iv) it does not have the ability to encode multiple RNA modification types simultaneously (e.g. m5C and hm5C), (v) it cannot be parallelized by splitting the input FAST5 files into separate read chunks and (vi) it does not offer options for downstream analyses or visualization of the results (Supplementary Table S1). We present ModPhred, a toolkit that encodes DNA and/or RNA modification information within the FastQ and BAM formats, allowing its analysis and visualization at single molecule resolution (Fig. 1A). The toolkit is easy to use by the non-bioinformatic expert, and generates user-friendly reports to facilitate the downstream analyses as well as several forms of visualization of the modification information (Fig. 1B), both at per-site as well as at per-read level

Materials and methods
Implementation of ModPhred
Benchmarking of modPhred and comparison to available tools
Findings
Conflict of Interest
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call