Evaluation of the accuracy of imputed sequence variant genotypes and their utility for causal variant detection in cattle

Hubert Pausch,Ruedi Fries,Michael E Goddard,Iona M Macleod,Reiner Emmerling,Hans D Daetwyler,Phil J Bowman

doi:10.1186/s12711-017-0301-x

Abstract

BackgroundThe availability of dense genotypes and whole-genome sequence variants from various sources offers the opportunity to compile large datasets consisting of tens of thousands of individuals with genotypes at millions of polymorphic sites that may enhance the power of genomic analyses. The imputation of missing genotypes ensures that all individuals have genotypes for a shared set of variants.ResultsWe evaluated the accuracy of imputation from dense genotypes to whole-genome sequence variants in 249 Fleckvieh and 450 Holstein cattle using Minimac and FImpute. The sequence variants of a subset of the animals were reduced to the variants that were included on the Illumina BovineHD genotyping array and subsequently inferred in silico using either within- or multi-breed reference populations. The accuracy of imputation varied considerably across chromosomes and dropped at regions where the bovine genome contains segmental duplications. Depending on the imputation strategy, the correlation between imputed and true genotypes ranged from 0.898 to 0.952. The accuracy of imputation was higher with Minimac than FImpute particularly for variants with a low minor allele frequency. Using a multi-breed reference population increased the accuracy of imputation, particularly when FImpute was used to infer genotypes. When the sequence variants were imputed using Minimac, the true genotypes were more correlated to predicted allele dosages than best-guess genotypes. The computing costs to impute 23,256,743 sequence variants in 6958 animals were ten-fold higher with Minimac than FImpute. Association studies with imputed sequence variants revealed seven quantitative trait loci (QTL) for milk fat percentage. Two causal mutations in the DGAT1 and GHR genes were the most significantly associated variants at two QTL on chromosomes 14 and 20 when Minimac was used to infer genotypes.ConclusionsThe population-based imputation of millions of sequence variants in large cohorts is computationally feasible and provides accurate genotypes. However, the accuracy of imputation is low in regions where the genome contains large segmental duplications or the coverage with array-derived single nucleotide polymorphisms is poor. Using a reference population that includes individuals from many breeds increases the accuracy of imputation particularly at low-frequency variants. Considering allele dosages rather than best-guess genotypes as explanatory variables is advantageous to detect causal mutations in association studies with imputed sequence variants.

Highlights

The availability of dense genotypes and whole-genome sequence variants from various sources offers the opportunity to compile large datasets consisting of tens of thousands of individuals with genotypes at millions of polymorphic sites that may enhance the power of genomic analyses
We evaluated the accuracy of imputation from dense genotypes to sequence variants in 249 FV and 450 HOL animals using sequence data on bovine chromosomes (BTA for Bos taurus) BTA1, 5, 10, 15, 20 and 25
Variants with a low minor allele frequency (MAF) were more frequent among the sequence than HD variants; between 58.12 and 60.55% of the sequence variants and between 14.27 and 18.55% of the HD variants had a MAF lower than 10%

Summary

Introduction

The availability of dense genotypes and whole-genome sequence variants from various sources offers the opportunity to compile large datasets consisting of tens of thousands of individuals with genotypes at millions of polymorphic sites that may enhance the power of genomic analyses. The imputation of missing genotypes ensures that all individuals have genotypes for a shared set of variants. The imputation of missing genotypes is necessary to ensure that all individuals have genotypes for a shared set of variants. Methods that apply a combination of family- and population-based imputation approaches exploit shared haplotypes among relatives thereby enabling rapid imputation of genotypes for tens of thousands of individuals and millions of markers in silico [8,9,10, 14]

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Genetics Selection Evolution	Publication Date: Feb 21, 2017
Citations: 94	License type: open-access

R Discovery Prime

R Discovery Prime

Evaluation of the accuracy of imputed sequence variant genotypes and their utility for causal variant detection in cattle

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genetics Selection Evolution

Lead the way for us

Similar Papers

Contribution of rare and low-frequency whole-genome sequence variants to complex traits variation in dairy cattle
Qianqian Zhang ... Mario P L Calus
Genetics Selection Evolution | VOL. 49
Qianqian Zhang, et. al.Qianqian Zhang ... Mario P L Calus
01 Aug 2017
Genetics Selection Evolution | VOL. 49

Meta-analysis of sequence-based association studies across three cattle breeds reveals 25 QTL for fat and protein percentages in milk at nucleotide resolution
Hubert Pausch ... Hans D Daetwyler
BMC Genomics | VOL. 18
Hubert Pausch, et. al.Hubert Pausch ... Hans D Daetwyler
09 Nov 2017
BMC Genomics | VOL. 18

A multi-trait meta-analysis with imputed sequence variants reveals twelve QTL for mammary gland morphology in Fleckvieh cattle.
Hubert Pausch ... Ruedi Fries
Genetics Selection Evolution | VOL. 48
Hubert Pausch, et. al.Hubert Pausch ... Ruedi Fries
16 Feb 2016
Genetics Selection Evolution | VOL. 48

Comparison of two multi-trait association testing methods and sequence-based fine mapping of six additive QTL in Swiss Large White pigs
A. Nosková ... A. Mehrotra
BMC Genomics | VOL. 24
A. Nosková, et. al.A. Nosková ... A. Mehrotra
10 Apr 2023
BMC Genomics | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluation of the accuracy of imputed sequence variant genotypes and their utility for causal variant detection in cattle

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genetics Selection Evolution