Rare variant association studies (RVAS) of complex traits have emerged as a powerful approach to advance drug discovery and diagnostics. Missense pathogenicity predictions from AlphaMissense based on structural context and protein language models improve the differentiation between benign and deleterious variants. Constraint metrics, on the other hand, allow researchers to pinpoint genomic regions under selective pressure that may not directly impact protein structure, but are more likely to contain functionally important mutations. Loss-of-function (LoF) variants, which result in the complete or partial loss of protein function, are particularly informative, as it is more straightforward to assess their downstream functional consequences. In this study, we present a unified meta regression model approach that incorporates the probability of pathogenicity, probability of constraint, and indicator whether a variant is a predicted loss-of-function or missense variant as features to model the observed effect size and uncertainty of effect size obtained from single-variant genetic analysis. We applied the unified meta regression model to 1,144 continuous phenotypes from UK Biobank using single variant summary statistics obtained from Genebass. We replicated our findings using the AllofUS cohort. For each gene discovery, we make available a characterization of whether constrained sites are associated with the phenotype, whether pathogenic sites determined by structural based predictions are associated with phenotype, and whether broader loss-of-function or missense variant annotation better explains the summary statistics observed. Our results are publicly available at Global Biobank Engine ( https://biobankengine.shinyapps.io/phenome-wide-unified-model/ ).
Read full abstract