Abstract

DNA base modifications, such as C5-methylcytosine (5mC) and N6-methyldeoxyadenosine (6mA), are important types of epigenetic regulations. Short-read bisulfite sequencing and long-read PacBio sequencing have inherent limitations to detect DNA modifications. Here, using raw electric signals of Oxford Nanopore long-read sequencing data, we design DeepMod, a bidirectional recurrent neural network (RNN) with long short-term memory (LSTM) to detect DNA modifications. We sequence a human genome HX1 and a Chlamydomonas reinhardtii genome using Nanopore sequencing, and then evaluate DeepMod on three types of genomes (Escherichia coli, Chlamydomonas reinhardtii and human genomes). For 5mC detection, DeepMod achieves average precision up to 0.99 for both synthetically introduced and naturally occurring modifications. For 6mA detection, DeepMod achieves ~0.9 average precision on Escherichia coli data, and have improved performance than existing methods on Chlamydomonas reinhardtii data. In conclusion, DeepMod performs well for genome-scale detection of DNA modifications and will facilitate epigenetic analysis on diverse species.

Highlights

  • A center event with a prediction target long short-term memory (LSTM) for forward flow LSTM for backward flow An event A signal

  • After repeating the prediction process by the LSTM recurrent neural network (RNN) for events of interest in a long read and for all long reads, the sequence coverage and the methylation coverage are generated for genomic positions of interest in the reference genome

  • DeepMod developed in this study bridges the gap between the rapid growth of Nanopore sequencing data and the increasing need of detecting DNA modifications at a genomic scale

Read more

Summary

Introduction

A center event with a prediction target LSTM for forward flow LSTM for backward flow An event A signal. Optional second neural network event information generated from Nanopore sequencing as input, and outputs modification summary for genomic positions of interest in a reference genome, together with modification prediction for bases of interest in a long read. The modification prediction model in DeepMod is a well-trained bidirectional recurrent neural network (RNN) with long short-term memory (LSTM)[28] units, which takes signal mean, standard deviation, and the number of signals of an event together with base information in the reference genome of an event and its neighbors as input, and makes modification prediction for the event. The prediction of DNA modification by DeepMod is strand-sensitive and has single-base resolution. We sequence a human genome HX1 and a Chlamydomonas reinhardtii genome using Nanopore sequencing techniques, and together with published Nanopore data for Escherichia coli and another human genome NA12878, we evaluate DeepMod on three types of genomes (E. coli, C. reinhardtii, and human genomes) and show that it performs well on genome-scale detection of DNA modifications

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call