Abstract

Genotype imputation is the term used to describe the process of inferring unobserved genotypes in a sample of individuals. It is a key step prior to a genome-wide association study (GWAS) or genomic prediction. The imputation accuracy will directly influence the results from subsequent analyses. In this simulation-based study, we investigate the accuracy of genotype imputation in relation to some factors characterizing SNP chip or low-coverage whole-genome sequencing (LCWGS) data. The factors included the imputation reference population size, the proportion of target markers /SNP density, the genetic relationship (distance) between the target population and the reference population, and the imputation method. Simulations of genotypes were based on coalescence theory accounting for the demographic history of pigs. A population of simulated founders diverged to produce four separate but related populations of descendants. The genomic data of 20,000 individuals were simulated for a 10-Mb chromosome fragment. Our results showed that the proportion of target markers or SNP density was the most critical factor affecting imputation accuracy under all imputation situations. Compared with Minimac4, Beagle5.1 reproduced higher-accuracy imputed data in most cases, more notably when imputing from the LCWGS data. Compared with SNP chip data, LCWGS provided more accurate genotype imputation. Our findings provided a relatively comprehensive insight into the accuracy of genotype imputation in a realistic population of domestic animals.

Highlights

  • The availability of next-generation sequencing technologies has made it possible to take account of whole-genome sequencing (WGS) data for genome-wide association studies (GWASs) or genomic prediction (GP) (Koboldt et al, 2013; Ni et al, 2017)

  • Imputation reliability appears to be a more useful measure with respect to genomic prediction because the nature of imputation reliability coincides with the definition of reliability used for breeding values, and it does not depend on minor allele frequency (MAF)

  • The imputation performance of Beagle5.1 was better than Minimac4 in most cases, but when the reference population was small, SNP density was low, or genetic distance was large; the imputation accuracy of Beagle5.1 was more affected than that of Minimac4

Read more

Summary

Introduction

The availability of next-generation sequencing technologies has made it possible to take account of whole-genome sequencing (WGS) data for genome-wide association studies (GWASs) or genomic prediction (GP) (Koboldt et al, 2013; Ni et al, 2017). Low-Coverage Genomic Data Imputation have incomplete linkage disequilibrium with the causal mutations They do not provide the understanding of the causal mutation that can be obtained by annotation of highly significant sequence variants. One option is to impute SNP array genotypes to sequence resolution based on a reference population of a small number of deeply sequenced relatives. Another option is imputation from a large number of sparsely sequenced individuals, obtained from low-coverage whole-genome sequencing (LCWGS). Compared to SNP chip data, LCWGS can expose the segregating sequence variants and mitigate the ascertainment bias from SNP array

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.