Accuracy of imputation to whole-genome sequence data in Holstein Friesian cattle.

Rianne Van Binsbergen,Mario Pl Calus,Roel F Veerkamp,Ina Hulsegge,Ben J Hayes,Fred A Van Eeuwijk,Marco Cam Bink

doi:10.1186/1297-9686-46-41

Rianne Van Binsbergen, Mario Pl Calus + Show 5 more

Open Access

https://doi.org/10.1186/1297-9686-46-41

Copy DOI

Abstract

BackgroundThe use of whole-genome sequence data can lead to higher accuracy in genome-wide association studies and genomic predictions. However, to benefit from whole-genome sequence data, a large dataset of sequenced individuals is needed. Imputation from SNP panels, such as the Illumina BovineSNP50 BeadChip and Illumina BovineHD BeadChip, to whole-genome sequence data is an attractive and less expensive approach to obtain whole-genome sequence genotypes for a large number of individuals than sequencing all individuals. Our objective was to investigate accuracy of imputation from lower density SNP panels to whole-genome sequence data in a typical dataset for cattle.MethodsWhole-genome sequence data of chromosome 1 (1737 471 SNPs) for 114 Holstein Friesian bulls were used. Beagle software was used for imputation from the BovineSNP50 (3132 SNPs) and BovineHD (40 492 SNPs) beadchips. Accuracy was calculated as the correlation between observed and imputed genotypes and assessed by five-fold cross-validation. Three scenarios S40, S60 and S80 with respectively 40%, 60%, and 80% of the individuals as reference individuals were investigated.ResultsMean accuracies of imputation per SNP from the BovineHD panel to sequence data and from the BovineSNP50 panel to sequence data for scenarios S40 and S80 ranged from 0.77 to 0.83 and from 0.37 to 0.46, respectively. Stepwise imputation from the BovineSNP50 to BovineHD panel and then to sequence data for scenario S40 improved accuracy per SNP to 0.65 but it varied considerably between SNPs.ConclusionsAccuracy of imputation to whole-genome sequence data was generally high for imputation from the BovineHD beadchip, but was low from the BovineSNP50 beadchip. Stepwise imputation from the BovineSNP50 to the BovineHD beadchip and then to sequence data substantially improved accuracy of imputation. SNPs with a low minor allele frequency were more difficult to impute correctly and the reliability of imputation varied more. Linkage disequilibrium between an imputed SNP and the SNP on the lower density panel, minor allele frequency of the imputed SNP and size of the reference group affected imputation reliability.

Highlights

The use of whole-genome sequence data can lead to higher accuracy in genome-wide association studies and genomic predictions
Our results show that functions that estimate linkage disequilibrium (LD) based on distance only or on the difference in minor allele frequency (MAF) between the imputed SNP and the closest SNP on the lower density marker panel did not provide a good indication of imputation reliability
When these functions were combined with an empirical derived function that corrects for MAF of the imputed SNPs and size of the reference group, a much better indication of imputation reliability was obtained but it was still not perfect (Figure 7)

Summary

Introduction

The use of whole-genome sequence data can lead to higher accuracy in genome-wide association studies and genomic predictions. Our objective was to investigate accuracy of imputation from lower density SNP panels to whole-genome sequence data in a typical dataset for cattle. A less expensive approach to produce sequence genotypes for a large number of individuals is to impute from lower density marker panels to whole-genome sequence data. In this case, a core set of individuals is fully sequenced, and the lower density genotypes of the remaining individuals will be imputed to wholegenome sequence genotypes using the sequenced individuals as reference [5,6,7,8]

Objectives

Methods

Results

Discussion

Conclusion