Abstract

A streaming assembly pipeline utilising real-time Oxford Nanopore Technology (ONT) sequencing data is important for saving sequencing resources and reducing time-to-result. A previous approach implemented in npScarf provided an efficient streaming algorithm for hybrid assembly but was relatively prone to mis-assemblies compared to other graph-based methods. Here we present npGraph, a streaming hybrid assembly tool using the assembly graph instead of the separated pre-assembly contigs. It is able to produce more complete genome assembly by resolving the path finding problem on the assembly graph using long reads as the traversing guide. Application to synthetic and real data from bacterial isolate genomes show improved accuracy while still maintaining a low computational cost. npGraph also provides a graphical user interface (GUI) which provides a real-time visualisation of the progress of assembly. The tool and source code is available at https://github.com/hsnguyen/assembly.

Highlights

  • Sequencing technology has reached a level of maturity which allows the decoding of virtually any piece of genetic material which can be obtained

  • We present npGraph, a novel algorithm to resolve the assembly graph in real-time using long read sequencing. npGraph uses the stream of long reads to untangle knots in the assembly graph, which is maintained in memory

  • In building the assembly graph, the de Bruijin graph assembler attempts to extend each contig as far as possible, until there is more than one possible way of extending due to the repetitive sequences beyond the information contained in short reads

Read more

Summary

Introduction

Sequencing technology has reached a level of maturity which allows the decoding of virtually any piece of genetic material which can be obtained. One particular strength of ONT technology is the production of ultra long reads. This is complementary to the dominant short read sequencing technology Illumina which is cheaper and has higher per base read quality but is unable to resolve the complex regions of the genome due to its read length limitation. Hybrid assembly with ONT reads provides the opportunity to use long reads to fully resolve existing microbial genome assemblies. The release of Read Until API provides the opportunity to selectively enrich parts of the genome, as has been implemented in customised targeted sequencing applications [1]. The combination of ReadUntil with streaming hybrid assembly opens the possibility for further efficiency gains by targeting sequencing to unresolved regions of the genome

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call