Abstract

Statistical power, which is the probability of correctly rejecting a false null hypothesis, is a limitation of genome-wide association studies (GWAS). Sample size is a major component of statistical power that can be easily affected by missingness in phenotypic data and restrain the ability to detect associated single-nucleotide polymorphisms (SNPs) with small effect sizes. Although some phenotypes are hard to collect because of cost and loss to follow-up, correlated phenotypes that are easily collected can be leveraged for association analysis. In this paper, we evaluate a phenotype imputation method that incorporates family structure and correlation between multiple phenotypes using GAW20 simulated data. The distribution of missing values is derived using information contained in the missing sample’s relatives and additional correlated phenotypes. We show that this imputation method can improve power in the association analysis compared with excluding observations with missing data, while achieving the correct Type I error rate.We also examine factors that may affect the imputation accuracy.

Highlights

  • Genome-wide association studies (GWAS) have uncovered thousands of single-nucleotide polymorphisms (SNPs) associated with complex traits [1]

  • A slightly inflated Type I error rate is seen in the incomplete data set when using the average difference outcome

  • We show that the imputed missing phenotypes can be determined under a multivariate normal (MVN) distribution using family structure, an observed second phenotype of the missing samples, and the 2 observed phenotypes of the missing sample’s family members

Read more

Summary

Introduction

Genome-wide association studies (GWAS) have uncovered thousands of single-nucleotide polymorphisms (SNPs) associated with complex traits [1]. The power of GWAS is limited by the number of individuals with data available for the trait of interest. Tens of thousands of individuals are typically contributing to GWAS. The lack of statistical power can still occur from missingness in phenotypic data. Some phenotypes are difficult to collect because of cost, loss to follow-up, and inaccessibility of the biological sample at the time of the study. Data collected on related phenotypes or from the missing sample’s relatives can be exploited. Current approaches to handle missing data include multiple imputation [2], intermediate phenotype analysis [3], and a recently published method, PhenIMP [4]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call