Impact of genetic similarity on imputation accuracy.

Nab Raj Roshyara,Markus Scholz

doi:10.1186/s12863-015-0248-2

Abstract

BackgroundGenotype imputation is a common technique in genetic research. Genetic similarity between target population and reference dataset is crucial for high-quality results. Although several reference panels are available, it is often not clear which is the most optimal for a particular target dataset to be imputed. Maximizing genetic similarity between study sample and intended reference panels may be the straight forward method for selecting the genetically best-matched reference. However, the impact of genetic similarity on imputation accuracy has not yet been studied in detail.ResultsWe performed a simulation study in 20 ethnic groups obtained from POPRES. High-quality SNPs were masked and re-imputed with MaCH, MaCH-minimac and IMPUTE2 using four different HapMap reference panels (CEU, CHB-JPT, MEX and YRI). Imputation accuracy was assessed by different statistics. Genetic similarity between ethnic groups and reference populations were measured by F -statistics (FST) originally proposed by Wright and G -statistics (GST) introduced by Nei and others. To assess the predictive power of these measures regarding imputation accuracy, we analysed relations between them and corresponding imputation accuracy scores. We found that population genetic distances between homogeneous reference and target populations were strongly linearly correlated with resulting imputation accuracies irrespective of considered distance measure, imputation accuracy measure, missingness and imputation software used. Possible exception was African population.ConclusionUsage of GST or FST-related measures for predicting the optimal reference panel for imputation frameworks relying on a specific reference is highly recommended. A cut-off of GST < 0.01 is recommended to achieve good imputation results for high-frequency variants and small data sets. The linear relationship is less pronounced for low-frequency variants for which we also observed a dependence of imputation accuracy on the number of polymorphic sites in the reference. We also show that the software specific measures MaCH-Rsq and IMPUTE-info must be interpreted with caution if the genetic distance of target and reference population is high.Electronic supplementary materialThe online version of this article (doi:10.1186/s12863-015-0248-2) contains supplementary material, which is available to authorized users.

Highlights

Genotype imputation is a common technique applied in the context of genome wide association (GWA) analysis
We investigated the cause of this deviation and found that low-frequencyvariants (SNPs with Minor allele frequency (MAF) ≤ 0.05) strongly influence FRST while G -statistics (GST) is robust
We conclude that GST is a good predictor of imputation accuracy for all type of imputation frameworks used under the best-matching policy for selecting a reference panel

Summary

Introduction

Genotype imputation is a common technique applied in the context of genome wide association (GWA) analysis. A set of densely genotyped samples is used as references to infer a large set of un-typed or missing markers in the target population. Strategies for selecting the individuals to be sequenced have been suggested recently [5]. These strategies consider genetic similarities between study population, subsets to be sequenced and the reference panel. Genetic similarity between target population and reference dataset is crucial for high-quality results. Maximizing genetic similarity between study sample and intended reference panels may be the straight forward method for selecting the genetically best-matched reference. The impact of genetic similarity on imputation accuracy has not yet been studied in detail

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genetics	Publication Date: Jul 22, 2015
Citations: 67	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Impact of genetic similarity on imputation accuracy.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genetics

Lead the way for us

Similar Papers

A Coalescent Model for Genotype Imputation
Ethan M Jewett ... Matthew Zawistowski
Genetics | VOL. 191
Ethan M Jewett, et. al.Ethan M Jewett ... Matthew Zawistowski
01 Aug 2012
Genetics | VOL. 191

Evaluation of ancient DNA imputation: a simulation study
Mariana Escobar-Rodríguez ... Krishna R Veeramah
Human Population Genetics and Genomics | VOL. -
Mariana Escobar-Rodríguez, et. al.Mariana Escobar-Rodríguez ... Krishna R Veeramah
05 Jan 2024
Human Population Genetics and Genomics | VOL. -

Systematic comparison of genotype imputation strategies in aquaculture: A case study in Nile tilapia (Oreochromis niloticus) populations
Shaopan Ye ... Hongyu Ma
Aquaculture | VOL. 592
Shaopan Ye, et. al.Shaopan Ye ... Hongyu Ma
06 Jun 2024
Aquaculture | VOL. 592

Accuracy of imputation of single-nucleotide polymorphism marker genotypes for water buffaloes (Bubalus bubalis) using different reference population sizes and imputation tools
Seyed Mohammad Ghoreishifar ... Ardeshir Nejati-Javaremi
Livestock Science | VOL. 216
Seyed Mohammad Ghoreishifar, et. al.Seyed Mohammad Ghoreishifar ... Ardeshir Nejati-Javaremi
23 Aug 2018
Livestock Science | VOL. 216

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Impact of genetic similarity on imputation accuracy.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genetics