Abstract

The domestic dog has evolved to be an important biomedical model for studies regarding the genetic basis of disease, morphology and behavior. Genetic studies in the dog have relied on a draft reference genome of a purebred female boxer dog named “Tasha” initially published in 2005. Derived from a Sanger whole genome shotgun sequencing approach coupled with limited clone-based sequencing, the initial assembly and subsequent updates have served as the predominant resource for canine genetics for 15 years. While the initial assembly produced a good-quality draft, as with all assemblies produced at the time, it contained gaps, assembly errors and missing sequences, particularly in GC-rich regions, which are found at many promoters and in the first exons of protein-coding genes. Here, we present Dog10K_Boxer_Tasha_1.0, an improved chromosome-level highly contiguous genome assembly of Tasha created with long-read technologies that increases sequence contiguity >100-fold, closes >23,000 gaps of the CanFam3.1 reference assembly and improves gene annotation by identifying >1200 new protein-coding transcripts. The assembly and annotation are available at NCBI under the accession GCF_000002285.5.

Highlights

  • High-quality reference genomes are fundamental assets for the study of genetic variation in any species

  • The dog genome assembly reported here was built using a combination of Pacific Biosciences (PacBio) continuous long-read (CLR) sequencing technology, 10x Chromium-linked reads, bacterial artificial chromosome (BAC) pair-end sequences and the draft reference genome sequence CanFam3.1

  • PacBio single-molecule real-time (SMRT) cells produced 27,878,642 reads with a mean length of 8514 bp and N50 read length—a length at which 50% of the bases are in reads longer or equal to—was 13,189 bp

Read more

Summary

Introduction

High-quality reference genomes are fundamental assets for the study of genetic variation in any species. The ability to link genotype to phenotype and the subsequent identification of functional variants rely on high fidelity assessment of variants throughout the genome. This reliance is well illustrated by the domestic dog, which offers specific challenges for any genetic study. For the dog system to advance further, long-read high-quality assemblies from different individuals are needed. This will greatly improve the sensitivity of variant detection, especially for large structural variation. The dog genome assembly reported here was built using a combination of Pacific Biosciences (PacBio) continuous long-read (CLR) sequencing technology, 10x Chromium-linked reads, BAC pair-end sequences and the draft reference genome sequence CanFam3.1

Whole Genome Sequencing
Genome Assembly Workflow
Assembly Quality Control
Fosmid End Sequence Alignment
Alignment of Finished BAC Clone Sequences
Detection of Common Repeats and Segmental Duplications
Gene Annotation
Genome Assembly Alignment
Structural Variant Detection
BAC Assembly
Results
Assembly Quality Assessment
Assembly Completeness
Analysis of Duplications
Analysis of Repetitive Sequences
Duplications at the Pancreatic Amylase Locus
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.