Accuracy of genotype imputation based on random and selected reference sets in purebred and crossbred sheep populations and its effect on accuracy of genomic prediction.

Nasir Moghaddar,Ben J Hayes,Hans D Daetwyler,Julius H J Van Der Werf,Klint P Gore

doi:10.1186/s12711-015-0175-8

Nasir Moghaddar, Ben J Hayes + Show 3 more

Open Access

https://doi.org/10.1186/s12711-015-0175-8

Copy DOI

Abstract

BackgroundThe objectives of this study were to investigate the accuracy of genotype imputation from low (12k) to medium (50k Illumina-Ovine) SNP (single nucleotide polymorphism) densities in purebred and crossbred Merino sheep based on a random or selected reference set and to evaluate the impact of using imputed genotypes on accuracy of genomic prediction.MethodsImputation validation sets were composed of random purebred or crossbred Merinos, while imputation reference sets were of variable sizes and included random purebred or crossbred Merinos or a group of animals that were selected based on high genetic relatedness to animals in the validation set. The Beagle software program was used for imputation and accuracy of imputation was assessed based on the Pearson correlation coefficient between observed and imputed genotypes. Genomic evaluation was performed based on genomic best linear unbiased prediction and its accuracy was evaluated as the Pearson correlation coefficient between genomic estimated breeding values using either observed (12k/50k) or imputed genotypes with varying levels of imputation accuracy and accurate estimated breeding values based on progeny-tests.ResultsImputation accuracy increased as the size of the reference set increased. However, accuracy was higher for purebred Merinos that were imputed from other purebred Merinos (on average 0.90 to 0.95 based on 1000 to 3000 animals) than from crossbred Merinos (0.78 to 0.87 based on 1000 to 3000 animals) or from non-Merino purebreds (on average 0.50). The imputation accuracy for crossbred Merinos based on 1000 to 3000 other crossbred Merino ranged from 0.86 to 0.88. Considerably higher imputation accuracy was observed when a selected reference set with a high genetic relationship to target animals was used vs. a random reference set of the same size (0.96 vs. 0.88, respectively). Accuracy of genomic prediction based on 50k genotypes imputed with high accuracy (0.88 to 0.99) decreased only slightly (0.0 to 0.67 % across traits) compared to using observed 50k genotypes. Accuracy of genomic prediction based on observed 12k genotypes was higher than accuracy based on lowly accurate (0.62 to 0.86) imputed 50k genotypes.

Highlights

IntroductionThe objectives of this study were to investigate the accuracy of genotype imputation from low (12k) to medium (50k Illumina-Ovine) SNP (single nucleotide polymorphism) densities in purebred and crossbred Merino sheep based on a random or selected reference set and to evaluate the impact of using imputed genotypes on accuracy of genomic prediction
The objectives of this study were to investigate the accuracy of genotype imputation from low (12k) to medium (50k Illumina-Ovine) single nucleotide polymorphism (SNP) densities in purebred and crossbred Merino sheep based on a random or selected reference set and to evaluate the impact of using imputed genotypes on accuracy of genomic prediction
Genomic evaluation refers to prediction of breeding values of selection candidates based on single nucleotide polymorphism (SNP) genotypes that are in linkage disequilibrium (LD) with quantitative trait loci (QTL) and

Summary

Introduction

The objectives of this study were to investigate the accuracy of genotype imputation from low (12k) to medium (50k Illumina-Ovine) SNP (single nucleotide polymorphism) densities in purebred and crossbred Merino sheep based on a random or selected reference set and to evaluate the impact of using imputed genotypes on accuracy of genomic prediction. A number of studies have compared the effect of SNP density on genomic prediction, mainly from low- to medium-density, based on simulation or real data analyses and have shown a considerable improvement in prediction accuracy by increasing the density of SNP arrays, e.g., [6,7,8,9,10] Another strategy to achieve higher genomic prediction accuracy from low-density SNP sets is to genotype industry animals with a low-density SNP array and to infer the un-typed SNP genotypes to a denser marker array based on a reference set via genotype imputation [11, 12]. Genotype imputation refers to statistical inference of un-typed marker genotypes in a set of low-density genotyped animals (imputation test set) based on a group of animals that are genotyped with higher density marker arrays (imputation reference set) [13]

Objectives

Methods

Results

Discussion

Conclusion