Abstract

Multiple imputation based on chained equations (MICE) is an alternative missing genotype method that can use genetic and nongenetic auxiliary data to inform the imputation process. Previously, MICE was successfully tested on strongly linked genetic data. We have now tested it on data of the HBA2 gene which, by the experimental design used in a malaria association study in Tanzania, shows a high missing data percentage and is weakly linked with the remaining genetic markers in the data set. We constructed different imputation models and studied their performance under different missing data conditions. Overall, MICE failed to accurately predict the true genotypes. However, using the best imputation model for the data, we obtained unbiased estimates for the genetic effects, and association signals of the HBA2 gene on malaria positivity. When the whole data set was analyzed with the same imputation model, the association signal increased from 0.80 to 2.70 before and after imputation, respectively. Conversely, postimputation estimates for the genetic effects remained the same in relation to the complete case analysis but showed increased precision. We argue that these postimputation estimates are reasonably unbiased, as a result of a good study design based on matching key socio-environmental factors.

Highlights

  • Missing genotypes are common in genetic association studies but often discarded from the analysis

  • Since the advent of the HapMap project and the decrease in costs of genome-wide association studies (GWAS), several imputation methods have been proposed to deal with missing genotypes of typed or untyped singlenucleotide polymorphisms

  • Multiple imputation based on chained equations (MICE) has proven to be a useful imputation technique in different statistical applications (Clark & Altman, 2003; Ambler et al, 2007; van Buuren, 2007)

Read more

Summary

Introduction

Missing genotypes are common in genetic association studies but often discarded from the analysis. This popular practice typically decreases statistical efficiency and power in comparison to an analysis where missing data are taken into account conveniently. It may introduce estimation bias, when the missing data pattern is not completely random. Since the advent of the HapMap project and the decrease in costs of genome-wide association studies (GWAS), several imputation methods have been proposed to deal with missing genotypes of typed or untyped singlenucleotide polymorphisms

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.