Abstract

Given the difficulty and effort required to confirm candidate causal SNPs detected in genome-wide association studies (GWAS), there is no practical way to definitively filter false positives. Recent advances in algorithmics and statistics have enabled repeated exhaustive search for bivariate features in a practical amount of time using standard computational resources, allowing us to use cross-validation to evaluate the stability. We performed 10 trials of 2-fold cross-validation of exhaustive bivariate analysis on seven Wellcome–Trust Case–Control Consortium GWAS datasets, comparing the traditional test for association, the high-performance GBOOST method and the recently proposed GSS statistic (Available at http://bioinformatics.research.nicta.com.au/software/gwis/). We use Spearman's correlation to measure the similarity between the folds of cross validation. To compare incomplete lists of ranks we propose an extension to Spearman's correlation. The extension allows us to consider a natural threshold for feature selection where the correlation is zero.This is the first reported cross-validation study of exhaustive bivariate GWAS feature selection. We found that stability between ranked lists from different cross-validation folds was higher for GSS in the majority of diseases. A thorough analysis of the correlation between SNP-frequency and univariate score demonstrated that the test for association is highly confounded by main effects: SNPs with high univariate significance replicably dominate the ranked results. We show that removal of the univariately significant SNPs improves replicability but risks filtering pairs involving SNPs with univariate effects. We empirically confirm that the stability of GSS and GBOOST were not affected by removal of univariately significant SNPs.These results suggest that the GSS and GBOOST tests are successfully targeting bivariate association with phenotype and that GSS is able to reliably detect a larger set of SNP-pairs than GBOOST in the majority of the data we analysed. However, the test for association was confounded by main effects.

Highlights

  • Genome-Wide Association Studies (GWAS) measure hundreds of thousands of SNPs from thousands of individuals with the aim of detecting statistical association between individuals’ phenotype and genotype

  • We investigated the stability of SNP pairs found using bivariate hypothesis testing

  • Stability was investigated by repeatedly splitting the GWAS datasets in half, evaluating and ranking all pairs in each half and estimating the correlation between rankings observed in both halves

Read more

Summary

Introduction

Genome-Wide Association Studies (GWAS) measure hundreds of thousands of SNPs from thousands of individuals with the aim of detecting statistical association between individuals’ phenotype and genotype. SNPs are known to be useful markers for disease and are typically measured using microarray-based approaches [1]. Despite application of numerous methods to GWAS, for most diseases there remains a gap between the level of association observed from the SNPs and the total level of genetic heritability known to exist; this is the problem of ‘‘missing heritability’’ [3]. One hypothesis is that the missing heritability of disease phenotypes could be further explained by combinatorial analysis of interactions between SNPs [4]. There are few studies that have demonstrated interactions between SNPs that replicate across multiple datasets, let alone explaining some portion of the missing heritability

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.