Abstract

We present GSD_1.0, a high-quality domestic dog reference genome with chromosome length scaffolds and contiguity increased 55-fold over CanFam3.1. Annotation with generated and existing long and short read RNA-seq, miRNA-seq and ATAC-seq, revealed that 32.1% of lifted over CanFam3.1 gaps harboured previously hidden functional elements, including promoters, genes and miRNAs in GSD_1.0. A catalogue of canine “dark” regions was made to facilitate mapping rescue. Alignment in these regions is difficult, but we demonstrate that they harbour trait-associated variation. Key genomic regions were completed, including the Dog Leucocyte Antigen (DLA), T Cell Receptor (TCR) and 366 COSMIC cancer genes. 10x linked-read sequencing of 27 dogs (19 breeds) uncovered 22.1 million SNPs, indels and larger structural variants. Subsequent intersection with protein coding genes showed that 1.4% of these could directly influence gene products, and so provide a source of normal or aberrant phenotypic modifications.

Highlights

  • We present GSD_1.0, a high-quality domestic dog reference genome with chromosome length scaffolds and contiguity increased 55-fold over CanFam3.1

  • Domestic dogs have lived alongside humans for at least 10,000 years[1,2], and during this time, they have adapted to a shared environment and diet, while being selectively bred for traits such as morphology[3] and behaviour[4]

  • The types of canine variants implicated in disease range from single-nucleotide polymorphisms (SNPs) through complex genomic rearrangements, and were identified with canine SNP chips, e.g., CanineHD BeadChip (Illumina), genotyping complemented with imputation[7] or genome and transcriptome sequencing of individuals, families[8] or large populations[3]

Read more

Summary

Introduction

We present GSD_1.0, a high-quality domestic dog reference genome with chromosome length scaffolds and contiguity increased 55-fold over CanFam3.1. The current canine reference genome, CanFam3.1, is based on a 2005 7.4× Sanger sequencing framework[9], improved in 2014 with multiple methods to better resolve euchromatic regions and annotate transcripts from gross tissues[10] It still contains 23,876 gaps, with 19.6% of these within gene bodies, and a further 9.8% located a mere 5 kb upstream of predicted gene start sites. A liftover of gap regions from CanFam3.1 showed 23,251/23,836 elements contain uniquely anchored sequences in GSD_1.0, and annotation of the new reference resulted in 159 thousand transcripts across 29,583 genes This novel data open the door to the identification of functional variants underlying complex traits, especially in difficult to sequence, and often biologically important regions

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call