Abstract

Oxford Nanopore Technologies' (ONT) long read sequencers offer access to longer DNA fragments than previous sequencer generations, at the cost of a higher error rate. While many papers have studied read correction methods, few have addressed the detailed characterization of observed errors, a task complicated by frequent changes in chemistry and software in ONT technology. The MinION sequencer is now more stable and this paper proposes an up-to-date view of its error landscape, using the most mature flowcell and basecaller. We studied Nanopore sequencing error biases on both bacterial and human DNA reads. We found that, although Nanopore sequencing is expected not to suffer from GC bias, it is a crucial parameter with respect to errors. In particular, low-GC reads have fewer errors than high-GC reads (about 6% and 8% respectively). The error profile for homopolymeric regions or regions with short repeats, the source of about half of all sequencing errors, also depends on the GC rate and mainly shows deletions, although there are some reads with long insertions. Another interesting finding is that the quality measure, although over-estimated, offers valuable information to predict the error rate as well as the abundance of reads. We supplemented this study with an analysis of a rapeseed RNA read set and shown a higher level of errors with a higher level of deletion in these data. Finally, we have implemented an open source pipeline for long-term monitoring of the error profile, which enables users to easily compute various analysis presented in this work, including for future developments of the sequencing device. Overall, we hope this work will provide a basis for the design of better error-correction methods.

Highlights

  • Nanopore sequencing is based on measuring changes in the electrical signal generated from DNA or RNA molecules passing through nano-scaled pores

  • Guppy HAC basecalling mode reduces error rates by about 2% compared to FAST mode

  • FAST mode has not improved between the two versions and the difference in error rates between this mode and HAC mode has grown high enough for us to advise against continuing to use FAST mode, especially since the efficiency of HAC has improved and it is only 2 times slower than FAST in our measures

Read more

Summary

Introduction

Nanopore sequencing is based on measuring changes in the electrical signal generated from DNA or RNA molecules passing through nano-scaled pores. This third-generation technology is developed and marketed by Oxford Nanopore Technologies (ONT), that uses a small portable sequencing device called MinION [1]. It offers many interesting features, including long read sequencing (the mean read length often exceeds 10 kb, and maximal read length reaches up to 880 kb [2]), a real-time analysis and a low initial investment.

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call