Abstract

Gene-based rare variant association studies (RVASs) have low power due to the infrequency of rare variants and the large multiple testing burden. To correct for multiple testing, traditional false discovery rate (FDR) procedures which depend solely on P-values are often used. Recently, Independent Hypothesis Weighting (IHW) was developed to improve the detection power while maintaining FDR control by leveraging prior information for each hypothesis. Here, we present a framework to increase power of gene-based RVASs by incorporating prior information using IHW. We first build supervised machine learning models to assign each gene a prediction score that measures its disease risk, using the input of multiple biological features, fed with high-confidence risk genes and local background genes selected near GWAS significant loci as the training set. Then we use the prediction scores as covariates to prioritize RVAS results via IHW. We demonstrate the effectiveness of this framework through applications to RVASs in schizophrenia and autism spectrum disorder. We found sizeable improvements in the number of significant associations compared to traditional FDR approaches, and independent evidence supporting the relevance of the genes identified by our framework but not traditional FDR, demonstrating the potential of our framework to improve power of gene-based RVASs.

Highlights

  • Rare variant association studies (RVASs) enable the identification of disease-associated genes with clear functional support [1]

  • We propose to identify genes associated with Schizophrenia (SCZ) from gene-based p-values in a recent RVAS [16] using predictions informed from a recent Genome-wide association studies (GWASs) [20]

  • As there is significant overlap of risk genes between SCZ and autism spectrum disorders (ASD) [16], we propose to use the same predictions as covariates to adjust gene-based p-values in a recently published ASD RVAS [21]

Read more

Summary

Introduction

Rare variant association studies (RVASs) enable the identification of disease-associated genes with clear functional support [1]. False discovery rate (FDR) [2,3] control has become a popular approach for detecting weak effects by limiting the expected false discovery proportion (FDP). While BH is nearly optimal when all hypotheses are likely to be null [4], it suffers from suboptimal power when tests are heterogeneous [5], which is often the case in modern applications like RVASs. Different from the BH procedure, hypothesis-weighting FDR control procedures have been proposed to incorporate prior information to up-weight or down-weight hypotheses [6]. The idea is that more FDR budget can be allocated to hypotheses with greater prior probability of being non-null, there is the potential to increase detection power [4,7]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.