Abstract

Genome-wide association studies have successfully identified associations between common diseases and a large number of single nucleotide polymorphisms (SNPs) across the genome. We investigate the effectiveness of several statistics, including p-values, likelihoods, genetic map distance and linkage disequilibrium between SNPs, in filtering SNPs in several disease-associated regions. We use simulated data to compare the efficacy of filters with different sample sizes and for causal SNPs with different minor allele frequencies (MAFs) and effect sizes, focusing on the small effect sizes and MAFs likely to represent the majority of unidentified causal SNPs. In our analyses, of all the methods investigated, filtering on the ranked likelihoods consistently retains the true causal SNP with the highest probability for a given false positive rate. This was the case for all the local linkage disequilibrium patterns investigated. Our results indicate that when using this method to retain only the top 5% of SNPs, even a causal SNP with an odds ratio of 1.1 and MAF of 0.08 can be retained with a probability exceeding 0.9 using an overall sample size of 50,000.

Highlights

  • Genome-wide association studies (GWAS) and candidate gene studies have highlighted regions of the genome containing variants affecting disease susceptibility

  • Several studies, rank single nucleotide polymorphisms (SNPs) based on likelihoods and the usual practice is to retain the set of SNPs with likelihoods within a prespecified ratio of the highest likelihood. This method leads to variable numbers of SNPs being retained. We examine this relative likelihood (RL) filter as well as the alternative of retaining a prespecified proportion of all SNPs based on ranking by likelihood

  • To explore the utility of this approach, this study considers the impact of effect size, sample size, minor allele frequency (MAF), mode of inheritance and filter threshold on the effectiveness of the filter proposed

Read more

Summary

Introduction

Genome-wide association studies (GWAS) and candidate gene studies have highlighted regions of the genome containing variants affecting disease susceptibility. The stage is fine-mapping of these regions to identify the variants most likely to be causal. This task is confounded by high correlation between variants in a small chromosomal region. The effects of this correlation as well as sampling variation mean that in tests of association the variant with the largest likelihood or smallest p-value will not necessarily be the causal variant. Several statistical methods for analysing fine-mapped data have been published but guidelines are needed to determine which of these will give the highest true positive rates (TPRs) and lowest false positive rates (FPRs) and in which scenarios

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.