Abstract

BackgroundThe development of trio binning as an approach for assembling diploid genomes has enabled the creation of fully haplotype-resolved reference genomes. Unlike other methods of assembly for diploid genomes, this approach is enhanced, rather than hindered, by the heterozygosity of the individual sequenced. To maximize heterozygosity and simultaneously assemble reference genomes for 2 species, we applied trio binning to an interspecies F1 hybrid of yak (Bos grunniens) and cattle (Bos taurus), 2 species that diverged nearly 5 million years ago. The genomes of both of these species are composed of acrocentric autosomes.ResultsWe produced the most continuous haplotype-resolved assemblies for a diploid animal yet reported. Both the maternal (yak) and paternal (cattle) assemblies have the largest 2 chromosomes in single haplotigs, and more than one-third of the autosomes similarly lack gaps. The maximum length haplotig produced was 153 Mb without any scaffolding or gap-filling steps and represents the longest haplotig reported for any species. The assemblies are also more complete and accurate than those reported for most other vertebrates, with 97% of mammalian universal single-copy orthologs present.ConclusionsThe high heterozygosity inherent to interspecies crosses maximizes the effectiveness of the trio binning method. The interspecies trio binning approach we describe is likely to provide the highest-quality assemblies for any pair of species that can interbreed to produce hybrid offspring that develop to sufficient cell numbers for DNA extraction.

Highlights

  • The development of trio binning as an approach for assembling diploid genomes has enabled the creation of fully haplotype-resolved reference genomes

  • We applied trio binning to a trio consisting of a yak cow (B. grunniens) ”Molly,” a Highland cattle bull (B. taurus) ”Duke,” and

  • After verifying Esperanza’s parentage (Supplementary Table S1), we sequenced both parents with Illumina short reads and their offspring with Pacific Biosciences long reads

Read more

Summary

Introduction

The development of trio binning as an approach for assembling diploid genomes has enabled the creation of fully haplotype-resolved reference genomes. Results: We produced the most continuous haplotype-resolved assemblies for a diploid animal yet reported Both the maternal (yak) and paternal (cattle) assemblies have the largest 2 chromosomes in single haplotigs, and more than one-third of the autosomes lack gaps. Most current longread genome assemblers, such as canu [8], flye [9], and miniasm [10], choose a random haplotype in each heterozygous region and save the unused haplotype as an alternate, resulting in a single pseudo-haploid assembly containing sequence from both parental haplotypes Another long-read assembler, FALCON-unzip, uses long reads spanning multiple heterozygous regions to phase the assembly graph as much as possible, but the assemblies it generates still contain numerous haplotype switch errors [11]. The long-range information present in proximity ligation and linked-read libraries has been used to phase diploid assembly graphs, with mixed results [12, 13]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call