Imputation of high-density genotypes in the Fleckvieh cattle population

Hubert Pausch,Bernhard Aigner,Christian Edel,Ruedi Fries,Reiner Emmerling,Kay-Uwe Götz

doi:10.1186/1297-9686-45-3

Abstract

BackgroundCurrently, genome-wide evaluation of cattle populations is based on SNP-genotyping using ~ 54 000 SNP. Increasing the number of markers might improve genomic predictions and power of genome-wide association studies. Imputation of genotypes makes it possible to extrapolate genotypes from lower to higher density arrays based on a representative reference sample for which genotypes are obtained at higher density.MethodsGenotypes using 639 214 SNP were available for 797 bulls of the Fleckvieh cattle breed. The data set was divided into a reference and a validation population. Genotypes for all SNP except those included in the BovineSNP50 Bead chip were masked and subsequently imputed for animals of the validation population. Imputation of genotypes was performed with Beagle, findhap.f90, MaCH and Minimac. The accuracy of the imputed genotypes was assessed for four different scenarios including 50, 100, 200 and 400 animals as reference population. The reference animals were selected to account for 78.03%, 89.21%, 97.47% and > 99% of the gene pool of the genotyped population, respectively.ResultsImputation accuracy increased as the number of animals and relatives in the reference population increased. Population-based algorithms provided highly reliable imputation of genotypes, even for scenarios with 50 and 100 reference animals only. Using MaCH and Minimac, the correlation between true and imputed genotypes was > 0.975 with 100 reference animals only. Pre-phasing the genotypes of both the reference and validation populations not only provided highly accurate imputed genotypes but was also computationally efficient. Genome-wide analysis of imputation accuracy led to the identification of many misplaced SNP.ConclusionsGenotyping key animals at high density and subsequent population-based genotype imputation yield high imputation accuracy. Pre-phasing the genotypes of the reference and validation populations is computationally efficient and results in high imputation accuracy, even when the reference population is small.

Highlights

Genome-wide evaluation of cattle populations is based on Single nucleotide polymorphism (SNP)-genotyping using ~ 54 000 SNP
Genotypes for animals of the validation population were imputed based on an increasing number of highly informative reference animals with high-density genotypes. 78.03% of the genes/haplotypes of the 797 studied animals could be traced back to the subset of the 50 most informative reference animals
6.49%, 1.46%, 0.26% and 0.11% of the masked genotypes remained missing after imputation with findhap.f90 for the scenarios including 50, 100, 200 and 400 reference animals, respectively

Summary

Introduction

Genome-wide evaluation of cattle populations is based on SNP-genotyping using ~ 54 000 SNP. Imputation of genotypes makes it possible to extrapolate genotypes from lower to higher density arrays based on a representative reference sample for which genotypes are obtained at higher density. Using densely spaced marker maps increases the probability of co-segregation of SNP and quantitative trait nucleotides (QTN) [3]. Since both genomic predictions and genome-wide association studies. Imputation of genotypes makes it possible to extrapolate genotypes from lower to higher density arrays based on a representative sample of individuals genotyped at highdensity. Different approaches for imputation of genotypes exploit pedigree information [12], population-wide LD Different approaches for imputation of genotypes exploit pedigree information [12], population-wide LD (e.g. [13,14]) or both sources of information (e.g. [15])

Methods

Results

Discussion

Conclusion