Abstract

BackgroundShort read DNA sequencing technologies have revolutionized genome assembly by providing high accuracy and throughput data at low cost. But it remains challenging to assemble short read data, particularly for large, complex and polyploid genomes. The linked read strategy has the potential to enhance the value of short reads for genome assembly because all reads originating from a single long molecule of DNA share a common barcode. However, the majority of studies to date that have employed linked reads were focused on human haplotype phasing and genome assembly.ResultsHere we describe a de novo maize B73 genome assembly generated via linked read technology which contains ~ 172,000 scaffolds with an N50 of 89 kb that cover 50% of the genome. Based on comparisons to the B73 reference genome, 91% of linked read contigs are accurately assembled. Because it was possible to identify errors with > 76% accuracy using machine learning, it may be possible to identify and potentially correct systematic errors. Complex polyploids represent one of the last grand challenges in genome assembly. Linked read technology was able to successfully resolve the two subgenomes of the recent allopolyploid, proso millet (Panicum miliaceum). Our assembly covers ~ 83% of the 1 Gb genome and consists of 30,819 scaffolds with an N50 of 912 kb.ConclusionsOur analysis provides a framework for future de novo genome assemblies using linked reads, and we suggest computational strategies that if implemented have the potential to further improve linked read assemblies, particularly for repetitive genomes.

Highlights

  • Short read DNA sequencing technologies have revolutionized genome assembly by providing high accuracy and throughput data at low cost

  • Linked read alignment to the B73 reference genome High molecular weight genomic DNA was extracted from maize B73 leaf tissue (Additional file 1: Figure S1) and linked read libraries were prepared using the 10× Genomics Chromium Controller and its Genome Kit V1

  • The LongRanger alignment pipeline mapped a higher percent of reads (86.2%), which is expected as this pipeline utilizes the linked read barcodes to further target alignments to defined regions of the genome

Read more

Summary

Introduction

Short read DNA sequencing technologies have revolutionized genome assembly by providing high accuracy and throughput data at low cost. Short-reads (100–250 bp) present challenges for the de novo assembly, haplotyping, and defining genomic structural variations [1]. These limitations are problematic in genomes with high repeat content or pervasive structural rearrangements such as many crop species [2, 3]. In response to these drawbacks, long-read sequencing platforms have been developed, such as the single-molecule real-time (SMRT) sequencing approach from PacBio. Long.

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.