Abstract

BackgroundGenome-wide association studies can provide novel insights into diseases of interest, as well as to the responsiveness of an individual to specific treatments. In such studies, it is very important to correct for population stratification, which refers to allele frequency differences between cases and controls due to systematic ancestry differences. Population stratification can cause spurious associations if not adjusted properly. The principal component analysis (PCA) method has been relied upon as a highly useful methodology to adjust for population stratification in these types of large-scale studies. Recently, the linear mixed model (LMM) has also been proposed to account for family structure or cryptic relatedness. However, neither of these approaches may be optimal in properly correcting for sample structures in the presence of subject outliers.ResultsWe propose to use robust PCA combined with k-medoids clustering to deal with population stratification. This approach can adjust for population stratification for both continuous and discrete populations with subject outliers, and it can be considered as an extension of the PCA method and the multidimensional scaling (MDS) method. Through simulation studies, we compare the performance of our proposed methods with several widely used stratification methods, including PCA and MDS. We show that subject outliers can greatly influence the analysis results from several existing methods, while our proposed robust population stratification methods perform very well for both discrete and admixed populations with subject outliers. We illustrate the new method using data from a rheumatoid arthritis study.ConclusionsWe demonstrate that subject outliers can greatly influence the analysis result in GWA studies, and propose robust methods for dealing with population stratification that outperform existing population stratification methods in the presence of subject outliers.

Highlights

  • Genome-wide association studies can provide novel insights into diseases of interest, as well as to the responsiveness of an individual to specific treatments

  • We propose to combine the clustering method used in Li and Yu [8] with robust principal component analysis (PCA) as an improved approach for correcting for artifacts arising from population stratification

  • As for the PCA method, when there were moderate differences between cases and controls, the false positive rates for random single-nucleotide polymorphisms (SNPs) and differentiated SNPs were close to the nominal level; for more extreme differences between cases and controls, the false positive rates for random SNPs were under control, but the false positive rates for differentiated SNPs were inflated

Read more

Summary

Introduction

Genome-wide association studies can provide novel insights into diseases of interest, as well as to the responsiveness of an individual to specific treatments. In such studies, it is very important to correct for population stratification, which refers to allele frequency differences between cases and controls due to systematic ancestry differences. The linear mixed model (LMM) has been proposed to account for family structure or cryptic relatedness Neither of these approaches may be optimal in properly correcting for sample structures in the presence of subject outliers. The goal of the case-control studies is to identify SNPs associated with the outcome of interest, such as disease status or responder/non-responder status

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call