Abstract
The domestic dog, Canis familiaris, is a well-established model system for mapping trait and disease loci. While the original draft sequence was of good quality, gaps were abundant particularly in promoter regions of the genome, negatively impacting the annotation and study of candidate genes. Here, we present an improved genome build, canFam3.1, which includes 85 MB of novel sequence and now covers 99.8% of the euchromatic portion of the genome. We also present multiple RNA-Sequencing data sets from 10 different canine tissues to catalog ∼175,000 expressed loci. While about 90% of the coding genes previously annotated by EnsEMBL have measurable expression in at least one sample, the number of transcript isoforms detected by our data expands the EnsEMBL annotations by a factor of four. Syntenic comparison with the human genome revealed an additional ∼3,000 loci that are characterized as protein coding in human and were also expressed in the dog, suggesting that those were previously not annotated in the EnsEMBL canine gene set. In addition to ∼20,700 high-confidence protein coding loci, we found ∼4,600 antisense transcripts overlapping exons of protein coding genes, ∼7,200 intergenic multi-exon transcripts without coding potential, likely candidates for long intergenic non-coding RNAs (lincRNAs) and ∼11,000 transcripts were reported by two different library construction methods but did not fit any of the above categories. Of the lincRNAs, about 6,000 have no annotated orthologs in human or mouse. Functional analysis of two novel transcripts with shRNA in a mouse kidney cell line altered cell morphology and motility. All in all, we provide a much-improved annotation of the canine genome and suggest regulatory functions for several of the novel non-coding transcripts.
Highlights
The dog, Canis familiaris, is an important and well-established genetic model used to study human disease
For Bacterial Artificial Chromosomes (BACs) re-sequencing, we selected (a) 283 gaps larger than 35 kb; (b) 100 clusters of gaps less than the BAC size of 180 kb; (c) 22 gaps between adjacent scaffolds anchored to the same chromosome; (d) 59 centromeric and telomeric regions; and (e) 65 regions indicated as being problematic by the whole genome shotgun assembler Arachne [19] which were of $50 kb, or which fell close to a gap of size 10–35 kb
The new genome build closed a total of 1,044 gaps containing promoters and/or first exons of protein coding genes that were missing in canFam2.0
Summary
The dog, Canis familiaris, is an important and well-established genetic model used to study human disease. In only a few centuries, hundreds of dog breeds have been created and continuously selected to generate diverse morphological, physiological and behavioral variation. In 2005, the first high-quality draft genome was published accompanied by a SNP discovery effort and the characterization of haplotype structure, as well as power calculations for strategies for genome-wide association mapping of trait and disease loci [5]. Intense breeding has resulted in short linkage disequilibrium (LD) across breeds but long LD within breeds, making the pure bred dog an ideal model to study disorders that affect humans through genome wide association [5,6]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.