Multi-scale inference of genetic trait architecture using biologically annotated neural networks.

Pinar Demetci,Pinar Demetci,Lorin Crawford,Wei Cheng,Xiang Zhou,Wei Cheng,Sohini Ramachandran,Lorin Crawford,Gregory Darnell,Gregory Darnell,Sohini Ramachandran,Xiang Zhou,Lorin Crawford

doi:10.1371/journal.pgen.1009754

Abstract

In this article, we present Biologically Annotated Neural Networks (BANNs), a nonlinear probabilistic framework for association mapping in genome-wide association (GWA) studies. BANNs are feedforward models with partially connected architectures that are based on biological annotations. This setup yields a fully interpretable neural network where the input layer encodes SNP-level effects, and the hidden layer models the aggregated effects among SNP-sets. We treat the weights and connections of the network as random variables with prior distributions that reflect how genetic effects manifest at different genomic scales. The BANNs software uses variational inference to provide posterior summaries which allow researchers to simultaneously perform (i) mapping with SNPs and (ii) enrichment analyses with SNP-sets on complex traits. Through simulations, we show that our method improves upon state-of-the-art association mapping and enrichment approaches across a wide range of genetic architectures. We then further illustrate the benefits of BANNs by analyzing real GWA data assayed in approximately 2,000 heterogenous stock of mice from the Wellcome Trust Centre for Human Genetics and approximately 7,000 individuals from the Framingham Heart Study. Lastly, using a random subset of individuals of European ancestry from the UK Biobank, we show that BANNs is able to replicate known associations in high and low-density lipoprotein cholesterol content.

Highlights

Over the two last decades, a considerable amount of methodological research in statistical genetics has focused on developing and improving the utility of linear models [1,2,3,4,5,6,7,8,9,10,11,12,13]
Annotated neural networks (BANNs) are feedforward models with partially connected architectures that are inspired by the hierarchical nature of biological enrichment analyses in genome-wide association (GWA) studies (Fig 1)
The Biologically Annotated Neural Networks (BANNs) software takes in one of two data types: (i) individual-level data D 1⁄4 fX; yg where X is an N × J matrix of genotypes with J denoting the number of single nucleotide polymorphisms (SNPs) encoded as {0, 1, 2} copies of a reference allele at each locus and y is an N-dimensional vector of quantitative traits (Fig 1A); or (ii) GWA summary statistics D 1⁄4 fR; θ^g where R is a J × J empirical linkage disequilibrium (LD) matrix of pairwise correlations between SNPs and θ^ are marginal effect size estimates for each SNP computed using ordinary least squares (OLS) (S1 Fig)

Summary

Introduction

Over the two last decades, a considerable amount of methodological research in statistical genetics has focused on developing and improving the utility of linear models [1,2,3,4,5,6,7,8,9,10,11,12,13]. The flexibility and interpretability of linear models make them a widely used tool in genome-wide association (GWA) studies, where the goal is to test for statistical associations between individual single nucleotide polymorphisms (SNPs) and a phenotype of interest In these cases, traditional variable selection approaches provide a set of P-values or posterior inclusion probabilities (PIPs) which lend statistical evidence on how important each variant is for explaining the overall genetic architecture of a trait. Traditional variable selection approaches provide a set of P-values or posterior inclusion probabilities (PIPs) which lend statistical evidence on how important each variant is for explaining the overall genetic architecture of a trait This univariate SNP-level mapping approach can be underpowered for “polygenic” traits which are generated by many mutations of small effect [14,15,16,17,18,19]. The performance of standard SNP-set methods can be hampered by strict additive modeling assumptions; and the most powerful of these statistical approaches rely on algorithms that are computationally inefficient and unreliable for large-scale sets of data [28]

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS Genetics	Publication Date: Aug 19, 2021
Citations: 18	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Multi-scale inference of genetic trait architecture using biologically annotated neural networks.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Genetics

Lead the way for us

Similar Papers

Author response: The lingering effects of Neanderthal introgression on human complex traits
Christopher R Robles ... Andrea Ganna
-
Christopher R Robles, et. al.Christopher R Robles ... Andrea Ganna
16 Feb 2023
16 Feb 2023

Editor's evaluation: The lingering effects of Neanderthal introgression on human complex traits
Graham Coop
-
Graham CoopGraham Coop
17 Aug 2022
17 Aug 2022

Fast and scalable ensemble learning method for versatile polygenic risk prediction
Tony Chen ... Xihong Lin
Proceedings of the National Academy of Sciences | VOL. 121
Tony Chen, et. al.Tony Chen ... Xihong Lin
07 Aug 2024
Proceedings of the National Academy of Sciences | VOL. 121

Decision letter: The lingering effects of Neanderthal introgression on human complex traits
Lawrence Uricchio ... Molly Przeworski
-
Lawrence Uricchio, et. al.Lawrence Uricchio ... Molly Przeworski
17 Aug 2022
17 Aug 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-scale inference of genetic trait architecture using biologically annotated neural networks.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Genetics