Imputation to whole-genome sequence using multiple pig populations and its use in genome-wide association studies

Sanne Van Den Berg,Roel F Veerkamp,Aniek C Bouwman,Marcos S Lopes,Fred A Van Eeuwijk,Jérémie Vandenplas

doi:10.1186/s12711-019-0445-y

Abstract

BackgroundUse of whole-genome sequence data (WGS) is expected to improve identification of quantitative trait loci (QTL). However, this requires imputation to WGS, often with a limited number of sequenced animals for the target population. The objective of this study was to investigate imputation to WGS in two pig lines using a multi-line reference population and, subsequently, to investigate the effect of using these imputed WGS (iWGS) for GWAS.MethodsPhenotypes and genotypes were available on 12,184 Large White pigs (LW-line) and 4943 Dutch Landrace pigs (DL-line). Imputed 660 K and 80 K genotypes for the LW-line and DL-line, respectively, were imputed to iWGS using Beagle v.4.1. Since only 32 LW-line and 12 DL-line boars were sequenced, 142 animals from eight commercial lines were added. GWAS were performed for each line using the 80 K and 660 K SNPs, the genotype scores of iWGS SNPs that had an imputation accuracy (Beagle R2) higher than 0.6, and the dosage scores of all iWGS SNPs.ResultsFor the DL-line (LW-line), imputation of 80 K genotypes to iWGS resulted in an average Beagle R2 of 0.39 (0.49). After quality control, 2.5 × 106 (3.5 × 106) SNPs had a Beagle R2 higher than 0.6, resulting in an average Beagle R2 of 0.83 (0.93). Compared to the 80 K and 660 K genotypes, using iWGS led to the identification of 48.9 and 64.4% more QTL regions, for the DL-line and LW-line, respectively, and the most significant SNPs in the QTL regions explained a higher proportion of phenotypic variance. Using dosage instead of genotype scores improved the identification of QTL, because the model accounted for uncertainty of imputation, and all SNPs were used in the analysis.ConclusionsImputation to WGS using the multi-line reference population resulted in relatively poor imputation, especially when imputing from 80 K (DL-line). In spite of the poor imputation accuracies, using iWGS instead of a lower density SNP chip increased the number of detected QTL and the estimated proportion of phenotypic variance explained by these QTL, especially when dosage scores were used instead of genotype scores. Thus, iWGS, even with poor imputation accuracy, can be used to identify possible interesting regions for fine mapping.

Highlights

Use of whole-genome sequence data (WGS) is expected to improve identification of quantitative trait loci (QTL)
26.1 × 106 single nucleotide polymorphisms (SNPs) were available in the sequenced reference animals, of which 17.6 × 106 and 21.7 × 106 segregated in the imputed WGS (iWGS) data for the DL-line and the LW-line, respectively
5.4 × 106 SNPs with iWGS dosage scores and 3.5 × 106 SNPs with genotype scores remained for the LW-line, and 5.8 × 106 SNPs with iWGS dosage scores and 2.5 × 106 SNPs with genotype scores remained for the DL-line

Summary

Introduction

Use of whole-genome sequence data (WGS) is expected to improve identification of quantitative trait loci (QTL) This requires imputation to WGS, often with a limited number of sequenced animals for the target population. Use of whole-genome sequence (WGS) data is expected to improve the detection of quantitative trait loci (QTL) because such data are expected to contain most of the causal single nucleotide polymorphisms (SNPs), as was shown in dairy cattle populations by using WGS data. Only a small number of animals is sequenced per line because, often, sequencing expenses must be divided across lines In those cases, it might be beneficial to combine the WGS data across lines into one reference population for imputation. As a result, combining populations for imputation may not provide sufficient imputation accuracy in all populations

Objectives

Methods

Results

Discussion

Conclusion