Comparisons of improved genomic predictions generated by different imputation methods for genotyping by sequencing data in livestock populations

Xiao Wang,Dan Hao,Guosheng Su,Mogens Sandø Lund,Haja N Kadarmideen

doi:10.1186/s40104-019-0407-9

Abstract

BackgroundGenotyping by sequencing (GBS) still has problems with missing genotypes. Imputation is important for using GBS for genomic predictions, especially for low depths, due to the large number of missing genotypes. Minor allele frequency (MAF) is widely used as a marker data editing criteria for genomic predictions. In this study, three imputation methods (Beagle, IMPUTE2 and FImpute software) based on four MAF editing criteria were investigated with regard to imputation accuracy of missing genotypes and accuracy of genomic predictions, based on simulated data of livestock population.ResultsFour MAFs (no MAF limit, MAF ≥ 0.001, MAF ≥ 0.01 and MAF ≥ 0.03) were used for editing marker data before imputation. Beagle, IMPUTE2 and FImpute software were applied to impute the original GBS. Additionally, IMPUTE2 also imputed the expected genotype dosage after genotype correction (GcIM). The reliability of genomic predictions was calculated using GBS and imputed GBS data. The results showed that imputation accuracies were the same for the three imputation methods, except for the data of sequencing read depth (depth) = 2, where FImpute had a slightly lower imputation accuracy than Beagle and IMPUTE2. GcIM was observed to be the best for all of the imputations at depth = 4, 5 and 10, but the worst for depth = 2. For genomic prediction, retaining more SNPs with no MAF limit resulted in higher reliability. As the depth increased to 10, the prediction reliabilities approached those using true genotypes in the GBS loci. Beagle and IMPUTE2 had the largest increases in prediction reliability of 5 percentage points, and FImpute gained 3 percentage points at depth = 2. The best prediction was observed at depth = 4, 5 and 10 using GcIM, but the worst prediction was also observed using GcIM at depth = 2.ConclusionsThe current study showed that imputation accuracies were relatively low for GBS with low depths and high for GBS with high depths. Imputation resulted in larger gains in the reliability of genomic predictions for GBS with lower depths. These results suggest that the application of IMPUTE2, based on a corrected GBS (GcIM) to improve genomic predictions for higher depths, and FImpute software could be a good alternative for routine imputation.

Highlights

Genotyping by sequencing (GBS) still has problems with missing genotypes
Accuracy of imputation based on original GBS data As shown in Fig. 1, there were very small differences in the imputation accuracy among the four GBS data sets based on the Minor allele frequency (MAF) criteria used
The IMPUTE2 imputation based on corrected GBS (GcIM) was observed to perform the best out of all the imputation methods based on the original GBS at depth = 4, 5 and 10

Summary

Introduction

Imputation is important for using GBS for genomic predictions, especially for low depths, due to the large number of missing genotypes. Minor allele frequency (MAF) is widely used as a marker data editing criteria for genomic predictions. Three imputation methods (Beagle, IMPUTE2 and FImpute software) based on four MAF editing criteria were investigated with regard to imputation accuracy of missing genotypes and accuracy of genomic predictions, based on simulated data of livestock population. Imputation strategies are important for using GBS for genomic predictions and many imputation methods have been developed. In a SNP array, missing markers with certain structure are usually identified with a high degree of certainty, but missing markers of GBS data can vary at different marker positions, especially for the mutations located in restriction enzyme cut sites. Imputation of missing genotypes of GBS SNP data could be less accurate than those in chip SNP data

Objectives

Methods

Results

Discussion

Conclusion