Abstract

In this article, we present Biologically Annotated Neural Networks (BANNs), a nonlinear probabilistic framework for association mapping in genome-wide association (GWA) studies. BANNs are feedforward models with partially connected architectures that are based on biological annotations. This setup yields a fully interpretable neural network where the input layer encodes SNP-level effects, and the hidden layer models the aggregated effects among SNP-sets. We treat the weights and connections of the network as random variables with prior distributions that reflect how genetic effects manifest at different genomic scales. The BANNs software uses variational inference to provide posterior summaries which allow researchers to simultaneously perform (i) mapping with SNPs and (ii) enrichment analyses with SNP-sets on complex traits. Through simulations, we show that our method improves upon state-of-the-art association mapping and enrichment approaches across a wide range of genetic architectures. We then further illustrate the benefits of BANNs by analyzing real GWA data assayed in approximately 2,000 heterogenous stock of mice from the Wellcome Trust Centre for Human Genetics and approximately 7,000 individuals from the Framingham Heart Study. Lastly, using a random subset of individuals of European ancestry from the UK Biobank, we show that BANNs is able to replicate known associations in high and low-density lipoprotein cholesterol content.

Highlights

  • Over the two last decades, a considerable amount of methodological research in statistical genetics has focused on developing and improving the utility of linear models [1,2,3,4,5,6,7,8,9,10,11,12,13]

  • Annotated neural networks (BANNs) are feedforward models with partially connected architectures that are inspired by the hierarchical nature of biological enrichment analyses in genome-wide association (GWA) studies (Fig 1)

  • The Biologically Annotated Neural Networks (BANNs) software takes in one of two data types: (i) individual-level data D 1⁄4 fX; yg where X is an N × J matrix of genotypes with J denoting the number of single nucleotide polymorphisms (SNPs) encoded as {0, 1, 2} copies of a reference allele at each locus and y is an N-dimensional vector of quantitative traits (Fig 1A); or (ii) GWA summary statistics D 1⁄4 fR; θ^g where R is a J × J empirical linkage disequilibrium (LD) matrix of pairwise correlations between SNPs and θ^ are marginal effect size estimates for each SNP computed using ordinary least squares (OLS) (S1 Fig)

Read more

Summary

Introduction

Over the two last decades, a considerable amount of methodological research in statistical genetics has focused on developing and improving the utility of linear models [1,2,3,4,5,6,7,8,9,10,11,12,13]. The flexibility and interpretability of linear models make them a widely used tool in genome-wide association (GWA) studies, where the goal is to test for statistical associations between individual single nucleotide polymorphisms (SNPs) and a phenotype of interest In these cases, traditional variable selection approaches provide a set of P-values or posterior inclusion probabilities (PIPs) which lend statistical evidence on how important each variant is for explaining the overall genetic architecture of a trait. Traditional variable selection approaches provide a set of P-values or posterior inclusion probabilities (PIPs) which lend statistical evidence on how important each variant is for explaining the overall genetic architecture of a trait This univariate SNP-level mapping approach can be underpowered for “polygenic” traits which are generated by many mutations of small effect [14,15,16,17,18,19]. The performance of standard SNP-set methods can be hampered by strict additive modeling assumptions; and the most powerful of these statistical approaches rely on algorithms that are computationally inefficient and unreliable for large-scale sets of data [28]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call