Abstract

Researchers have assembled thousands of eukaryotic genomes using Illumina reads, but traditional mate‐pair libraries cannot span all repetitive elements, resulting in highly fragmented assemblies. However, both chromosome conformation capture techniques, such as Hi‐C and Dovetail Genomics Chicago libraries and long‐read sequencing, such as Pacific Biosciences and Oxford Nanopore, help span and resolve repetitive regions and therefore improve genome assemblies. One important livestock species of arid regions that does not have a high‐quality contiguous reference genome is the dromedary (Camelus dromedarius). Draft genomes exist but are highly fragmented, and a high‐quality reference genome is needed to understand adaptation to desert environments and artificial selection during domestication. Dromedaries are among the last livestock species to have been domesticated, and together with wild and domestic Bactrian camels, they are the only representatives of the Camelini tribe, which highlights their evolutionary significance. Here we describe our efforts to improve the North African dromedary genome. We used Chicago and Hi‐C sequencing libraries from Dovetail Genomics to resolve the order of previously assembled contigs, producing almost chromosome‐level scaffolds. Remaining gaps were filled with Pacific Biosciences long reads, and then scaffolds were comparatively mapped to chromosomes. Long reads added 99.32 Mbp to the total length of the new assembly. Dovetail Chicago and Hi‐C libraries increased the longest scaffold over 12‐fold, from 9.71 Mbp to 124.99 Mbp and the scaffold N50 over 50‐fold, from 1.48 Mbp to 75.02 Mbp. We demonstrate that Illumina de novo assemblies can be substantially upgraded by combining chromosome conformation capture and long‐read sequencing.

Highlights

  • Technlogical advances in sequencing have enabled researchers to assemble thousands of eukaryotic genomes

  • Newer high‐throughput laboratory methods are beginning to overcome the limitations of traditional long‐insert libraries, and these new libraries can extend across repetitive regions enabling the scaffolding and ordering of previously unscaffolded contigs

  • We were able to greatly improve the North African dromedary genome assembly by using a combination of chromosome conformation capture sequencing libraries for scaffolding, long reads to fill in gaps, and comparative chromosome mapping to assign super scaffolds to chromosomes

Read more

Summary

| INTRODUCTION

Technlogical advances in sequencing have enabled researchers to assemble thousands of eukaryotic genomes. Both Oxford Nanopore and PacBio overcome the problems of error‐ prone raw reads by generating a consensus sequence either on the level of the instrument whereby DNA molecules in PacBio sequencers are read multiple times (i.e., circular consensus sequences) or after the sequences have been generated by PacBio or Oxford Nanopore sequencers These long‐read technologies generate longer sequences that can span repetitive regions, enabling the assembly of longer contigs that can be later error corrected and/or scaffolded into high‐quality eukaryotic genome assemblies using traditional long‐insert, Hi‐C, or Dovetail Chicago libraries (Jiao et al, 2017; Miller et al, 2017; Passera et al, 2018). We used Chicago and Hi‐C sequencing libraries from Dovetail Genomics to resolve the placement and order of previously de novo assembled contigs from Illumina short‐ and long‐insert libraries (Fitak et al, 2016), producing almost chromosome‐level scaffolds for which we filled in gaps using PacBio long reads, mapped scaffolds to chromosomes, and annotated the resulting assembly

| MATERIALS AND METHODS
| DISCUSSION
Findings
| CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.