Abstract

High quality reference genome sequences are the core of modern genomics. Oxford Nanopore Technologies (ONT) produces inexpensive DNA sequences, but has high error rates, which make sequence assembly and analysis difficult as genome size and complexity increases. Robust experimental design is necessary for ONT genome sequencing and assembly, but few studies have addressed eukaryotic organisms. Here, we present novel results using simulated and empirical ONT and DNA libraries to identify best practices for sequencing and assembly for several model species. We find that the unique error structure of ONT libraries causes errors to accumulate and assembly statistics plateau as sequence depth increases. High-quality assembled eukaryotic sequences require high-molecular-weight DNA extractions that increase sequence read length, and computational protocols that reduce error through pre-assembly correction and read selection. Our quantitative results will be helpful for researchers seeking guidance for de novo assembly projects.

Highlights

  • Many factors affect the quality of a de novo genome assembly

  • Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) are the current front-runners in long-read sequencing platforms; both are capable of average read lengths in the order of tens of thousands of base pairs and, theoretically, entire chromosomes can be sequenced in a single read [2, 3]

  • Nematodes used for genomic sequencing were grown on two 100-mm nematode growth medium (NGM) plates [19] seeded with E. coli OP50

Read more

Summary

Introduction

Many factors affect the quality of a de novo genome assembly. Genome size increases the size of the “puzzle” to put together, while the size of the pieces (sequence reads) remains the same. Long sequence reads span repetitive regions and can potentially identify the exact size and location of repeats on a chromosome. Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) are the current front-runners in long-read sequencing platforms; both are capable of average read lengths in the order of tens of thousands of base pairs and, theoretically, entire chromosomes can be sequenced in a single read [2, 3]. Both are capable of producing high-quality assembled sequences with reasonable amounts of data [4]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call