Imputation of genotypes from low density (50,000 markers) to high density (700,000 markers) of cows from research herds in Europe, North America, and Australasia using 2 reference populations

G Sahana,M.P.L Calus,B.J Hayes,J Johnston,J.E Pryce,E Wall,N Krattenmacher,D Spurlock,K.A Weigel,S Mcparland,R.J Spelman

doi:10.3168/jds.2013-7368

Abstract

Combining data from research herds may be advantageous, especially for difficult or expensive-to-measure traits (such as dry matter intake). Cows in research herds are often genotyped using low-density single nucleotide polymorphism (SNP) panels. However, the precision of quantitative trait loci detection in genome-wide association studies and the accuracy of genomic selection may increase when the low-density genotypes are imputed to higher density. Genotype data were available from 10 research herds: 5 from Europe [Denmark, Germany, Ireland, the Netherlands, and the United Kingdom (UK)], 2 from Australasia (Australia and New Zealand), and 3 from North America (Canada and the United States). Heifers from the Australian and New Zealand research herds were already genotyped at high density (approximately 700,000 SNP). The remaining genotypes were imputed from around 50,000 SNP to 700,000 using 2 reference populations. Although it was not possible to use a combined reference population, which would probably result in the highest accuracies of imputation, differences arising from using 2 high-density reference populations on imputing 50,000-marker genotypes of 583 animals (from the UK) were quantified. The European genotypes (n=4,097) were imputed as 1 data set, using a reference population of 3,150 that included genotypes from 835 Australian and 1,053 New Zealand females, with the remainder being males. Imputation was undertaken using population-wide linkage disequilibrium with no family information exploited. The UK animals were also included in the North American data set (n=1,579) that was imputed to high density using a reference population of 2,018 bulls. After editing, 591,213 genotypes on 5,999 animals from 10 research herds remained. The correlation between imputed allele frequencies of the 2 imputed data sets was high (>0.98) and even stronger (>0.99) for the UK animals that were part of each imputation data set. For the UK genotypes, 2.2% were imputed differently in the 2 high-density reference data sets used. Only 0.025% of these were homozygous switches. The number of discordant SNP was lower for animals that had sires that were genotyped. Discordant imputed SNP genotypes were most common when a large difference existed in allele frequency between the 2 imputed genotype data sets. For SNP that had ≥20% discordant genotypes, the difference between imputed data sets of allele frequencies of the UK (imputed) genotypes was 0.07, whereas the difference in allele frequencies of the (reference) high-density genotypes was 0.30. In fact, regions existed across the genome where the frequency of discordant SNP was higher. For example, on chromosome 10 (centered on 520,948 bp), 52 SNP (out of a total of 103 SNP) had ≥20% discordant SNP. Four hundred and eight SNP had more than 20% discordant genotypes and were removed from the final set of imputed genotypes. We concluded that both discordance of imputed SNP genotypes and differences in allele frequencies, after imputation using different reference data sets, may be used to identify and remove poorly imputed SNP.

Full Text