De novo Nanopore read quality improvement using deep learning

Nathan Lapierre,Zhong Wang,Rob Egan,Wei Wang

doi:10.1186/s12859-019-3103-z

Abstract

BackgroundLong read sequencing technologies such as Oxford Nanopore can greatly decrease the complexity of de novo genome assembly and large structural variation identification. Currently Nanopore reads have high error rates, and the errors often cluster into low-quality segments within the reads. The limited sensitivity of existing read-based error correction methods can cause large-scale mis-assemblies in the assembled genomes, motivating further innovation in this area.ResultsHere we developed a Convolutional Neural Network (CNN) based method, called MiniScrub, for identification and subsequent “scrubbing” (removal) of low-quality Nanopore read segments to minimize their interference in downstream assembly process. MiniScrub first generates read-to-read overlaps via MiniMap2, then encodes the overlaps into images, and finally builds CNN models to predict low-quality segments. Applying MiniScrub to real world control datasets under several different parameters, we show that it robustly improves read quality, and improves read error correction in the metagenome setting. Compared to raw reads, de novo genome assembly with scrubbed reads produces many fewer mis-assemblies and large indel errors.ConclusionsMiniScrub is able to robustly improve read quality of Oxford Nanopore reads, especially in the metagenome setting, making it useful for downstream applications such as de novo assembly. We propose MiniScrub as a tool for preprocessing Nanopore reads for downstream analyses. MiniScrub is open-source software and is available at https://bitbucket.org/berkeleylab/jgi-miniscrub.

Highlights

Long read sequencing technologies such as Oxford Nanopore can greatly decrease the complexity of de novo genome assembly and large structural variation identification
We developed a method called MiniScrub that performs de novo long read scrubbing using the combined power of fast approximate read-to-read overlapping, deep Convolutional Neural Networks, and a novel method for pileup image generation
MiniScrub uses minimizers to quickly overlap long reads, encodes these overlaps into pileup images, and uses a convolutional neural network to predict parts of reads below a certain quality threshold that should be removed

Summary

Methods

Method overviewThe three steps involved in MiniScrub are illustrated in Fig. 1 and explained in further detail in the subsections below. The first step is training a CNN model, a step only needs to be done once, in order to learn the error profile of a certain sequencing technology and base caller. The model training step starts with building a training set with reads from a known reference genome. These reads are mapped using GraphMap [26] to the reference genomes. For each read segment we calculate its percent identity, e.g. the percentage of bases in the read that match the reference, as labels. We use a modified version of MiniMap2 [22] to obtain read-to-read overlaps between all reads in the training set (see below for details), and embed relevant information (minimizers matched, distance between minimizers, and base quality scores) into Red-Green-Blue (RGB) pixels to form “pileup” images. One image is generated for each read, and is broken into the same short segments as above

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Nov 6, 2019
Citations: 11	License type: open-access

R Discovery Prime

R Discovery Prime

De novo Nanopore read quality improvement using deep learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Finding Nemo: hybrid assembly with Oxford Nanopore and Illumina reads greatly improves the clownfish (Amphiprion ocellaris) genome assembly.
Mun Hua Tan ... Michael P Hammer
GigaScience | VOL. 7
Mun Hua Tan, et. al.Mun Hua Tan ... Michael P Hammer
12 Jan 2018
GigaScience | VOL. 7

Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome
Sara Goodwin ... W Richard Mccombie
Genome Research | VOL. 25
Sara Goodwin, et. al.Sara Goodwin ... W Richard Mccombie
07 Oct 2015
Genome Research | VOL. 25

Can we use it? On the utility of de novo and reference-based assembly of Nanopore data for plant plastome sequencing.
Agnes Scheunert ... Thomas Lingl
PLOS ONE | VOL. 15
Agnes Scheunert, et. al.Agnes Scheunert ... Thomas Lingl
24 Mar 2020
PLOS ONE | VOL. 15

Efficient assembly of nanopore reads via highly accurate and intact error correction
Ying Chen ... Qi Dai
Nature Communications | VOL. 12
Ying Chen, et. al.Ying Chen ... Qi Dai
04 Jan 2021
Nature Communications | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

De novo Nanopore read quality improvement using deep learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics