Abstract

In nanopore sequencing, electrical signal is measured as DNA molecules pass through the sequencing pores. Translating these signals into DNA bases (base calling) is a highly non-trivial task, and its quality has a large impact on the sequencing accuracy. The most successful nanopore base callers to date use convolutional neural networks (CNN) to accomplish the task. Convolutional layers in CNNs are typically composed of filters with constant window size, performing best in analysis of signals with uniform speed. However, the speed of nanopore sequencing varies greatly both within reads and between sequencing runs. Here, we present dynamic pooling, a novel neural network component, which addresses this problem by adaptively adjusting the pooling ratio. To demonstrate the usefulness of dynamic pooling, we developed two base callers: Heron and Osprey. Heron improves the accuracy beyond the experimental high-accuracy base caller Bonito developed by Oxford Nanopore. Osprey is a fast base caller that can compete in accuracy with Guppy high-accuracy mode, but does not require GPU acceleration and achieves a near real-time speed on common desktop CPUs. Availability: https://github.com/fmfi-compbio/osprey, https://github.com/fmfi-compbio/heron.

Highlights

  • The MinION by Oxford Nanopore Technologies (ONT) is a portable DNA sequencer, which is capable of producing very long reads [1], [2]

  • We developed two base callers, which use dynamic pooling to improve the speed vs. accuracy tradeoff

  • The recurrent neural aligner [26] introduces dependency between predictions by propagating state information. We describe this technique and its application to base calling in more detail in the Supplementary material, as we believe it may be useful in other bioinformatics contexts

Read more

Summary

Introduction

The MinION by Oxford Nanopore Technologies (ONT) is a portable DNA sequencer, which is capable of producing very long reads [1], [2]. The source of sequencing errors is the fact that the MinION measures tiny electrical current influenced by DNA passing through the pore. This noisy signal is translated into DNA bases via base caller software. The sequence of signal readouts can be split into events, each event corresponding to a passage of a single DNA base through the pore. A single event consists of roughly 8-10 signal readouts on average. Local variations in the speed of DNA passing through the pore cause the event length to vary widely, some events consisting of a single readout, while others spanning tens of readouts (see Figure 1). Segmentation of the signal into events is a difficult problem, and most base callers avoid performing explicit event segmentation

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.