Abstract

BackgroundGenetic relatedness is currently estimated by a combination of traditional pedigree-based approaches (i.e. numerator relationship matrices, NRM) and, given the recent availability of molecular information, using marker genotypes (via genomic relationship matrices, GRM). To date, GRM are computed by genome-wide pair-wise SNP (single nucleotide polymorphism) correlations.ResultsWe describe a new estimate of genetic relatedness using the concept of normalised compression distance (NCD) that is borrowed from Information Theory. Analogous to GRM, the resultant compression relationship matrix (CRM) exploits numerical patterns in genome-wide allele order and proportion, which are known to vary systematically with relatedness. We explored properties of the CRM in two industry cattle datasets by analysing the genetic basis of yearling weight, a phenotype of moderate heritability. In both Brahman (Bos indicus) and Tropical Composite (Bos taurus by Bos indicus) populations, the clustering inferred by NCD was comparable to that based on SNP correlations using standard principal component analysis approaches. One of the versions of the CRM modestly increased the amount of explained genetic variance, slightly reduced the ‘missing heritability’ and tended to improve the prediction accuracy of breeding values in both populations when compared to both NRM and GRM. Finally, a sliding window-based application of the compression approach on these populations identified genomic regions influenced by introgression of taurine haplotypes.ConclusionsFor these two bovine populations, CRM reduced the missing heritability and increased the amount of explained genetic variation for a moderately heritable complex trait. Given that NCD can sensitively discriminate closely related individuals, we foresee CRM having possible value for estimating breeding values in highly inbred populations.Electronic supplementary materialThe online version of this article (doi:10.1186/s12711-015-0158-9) contains supplementary material, which is available to authorized users.

Highlights

  • Genetic relatedness is currently estimated by a combination of traditional pedigree-based approaches and, given the recent availability of molecular information, using marker genotypes

  • We explored a basic measure of withingenome compression efficiency (CE) by expressing the single nucleotide polymorphism (SNP) genotype file sizes in bits before and after data compression by the gzip tool

  • For relationships corresponding to numerator relationship matrix (NRM) values of 0.25, the average genomic relationship matrices (GRM) were equal to 0.196 and 0.201 for BB and Tropical Composite (TC) cattle, respectively

Read more

Summary

Results

We attempted to identify genome properties that were responsible for the similarities and differences between the GRM and NCD measures of relatedness To answer this question, we overlaid the average Zebu contribution (based on a principal component analysis that included Angus and Nelore data) [12] of each pair (see Additional file 3: Figure S2). The linear relationship for CRM2 explains substantially more time than the pair-wise correlations required for GRM In our implementation, it took 102.6 and 162.4 h to compute the NCD matrices for BB and TC, respectively. As described by Roman-Ponce et al [11], we estimated the missing heritability as the proportion of the genetic variance not captured by the marker-based relationship matrix, where the later was either the GRM or one of the three alternate CRM These regions carry genes that are involved in bovine reproduction (NCOA2), immune function (BCL2) and fatness (ATP5H) (Table 9). It is important to note that there were two separate instances where two highly functionally related proteins were identified in independent genomic regions: (1) monocarboxylate transporter coded by SLC16A5 on BTA19 and its paralog coded by SLC16A4 on BTA3 and (2) two subunits of the mitochondrial ATP synthase, the F1 catalytic core complex coded by ATP5B on BTA5 and the membrane-spanning F0 complex coded by ATP5H on BTA19 (see Additional file 4: Table S1)

Conclusions
Methods
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call