Abstract

BackgroundGenotype imputation has become a standard practice in modern genetic research to increase genome coverage and improve the accuracy of genomic selection (GS) and genome-wide association studies (GWAS). We assessed accuracies of imputing 60K genotype data from lower density single nucleotide polymorphism (SNP) panels using a small set of the most common sires in a population of 2140 white layer chickens. Several factors affecting imputation accuracy were investigated, including the size of the reference population, the level of the relationship between the reference and validation populations, and minor allele frequency (MAF) of the SNP being imputed.ResultsThe accuracy of imputation was assessed with different scenarios using 22 and 62 carefully selected reference animals (Ref22 and Ref62). Animal-specific imputation accuracy corrected for gene content was moderate on average (~ 0.80) in most scenarios and low in the 3K to 60K scenario. Maximum average accuracies were 0.90 and 0.93 for the most favourable scenario for Ref22 and Ref62 respectively, when SNPs were masked independent of their MAF. SNPs with low MAF were more difficult to impute, and the larger reference population considerably improved the imputation accuracy for these rare SNPs. When Ref22 was used for imputation, the average imputation accuracy decreased by 0.04 when validation population was two instead of one generation away from the reference and increased again by 0.05 when validation was three generations away. Selecting the reference animals from the most common sires, compared with random animals from the population, considerably improved imputation accuracy for low MAF SNPs, but gave only limited improvement for other MAF classes. The allelic R2 measure from Beagle software was found to be a good predictor of imputation reliability (correlation ~ 0.8) when the density of validation panel was very low (3K) and the MAF of the SNP and the size of the reference population were not extremely small.ConclusionsEven with a very small number of animals in the reference population, reasonable accuracy of imputation can be achieved. Selecting a set of the most common sires, rather than selecting random animals for the reference population, improves the imputation accuracy of rare alleles, which may be a benefit when imputing with whole genome re-sequencing data.Electronic supplementary materialThe online version of this article (doi:10.1186/s12863-015-0253-5) contains supplementary material, which is available to authorized users.

Highlights

  • Genotype imputation has become a standard practice in modern genetic research to increase genome coverage and improve the accuracy of genomic selection (GS) and genome-wide association studies (GWAS)

  • In this study, the accuracy of imputation to 60K genotypes from lower density single nucleotide polymorphism (SNP) panels (3K and 48K) was assessed in genotype data from GGA1 of layer chickens, when using a small reference population of the most common sires that are influential in the validation population

  • Similar to the results for G0, imputation accuracies substantially increased for G1 and G2 by increasing the size of reference population in these generations (Table 2, Fig. 1)

Read more

Summary

Introduction

Genotype imputation has become a standard practice in modern genetic research to increase genome coverage and improve the accuracy of genomic selection (GS) and genome-wide association studies (GWAS). Using dense single nucleotide polymorphism (SNP) panels, genomic selection (GS) and genome-wide association studies (GWAS) have become common in animal and plant genomic breeding programs. Both GS and GWAS exploit linkage disequilibrium (LD) between SNPs and causative mutations. Many studies that used SNP chip data [18,19,20,21,22,23] and sequence data [24] to perform imputation have demonstrated lower imputation accuracy for SNPs with low MAF. Meuwissen and Goddard [25] found that the error rate was much improved when relatives were sequenced, and Khatkar et al [26] suggested that selecting animals for genotyping based on pedigree is a strategically optimised method if pedigree information is available

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call