Informative Bayesian Neural Network Priors for Weak Signals

Tianyu Cui,Pekka Marttinen,Samuel Kaski,Aki Havulinna

doi:10.1214/21-ba1291

Abstract

Encoding domain knowledge into the prior over the high-dimensional weight space of a neural network is challenging but essential in applications with limited data and weak signals. Two types of domain knowledge are commonly available in scientific applications: 1. feature sparsity (fraction of features deemed relevant); 2. signal-to-noise ratio, quantified, for instance, as the proportion of variance explained. We show how to encode both types of domain knowledge into the widely used Gaussian scale mixture priors with Automatic Relevance Determination. Specifically, we propose a new joint prior over the local (i.e., feature-specific) scale parameters that encodes knowledge about feature sparsity, and a Stein gradient optimization to tune the hyperparameters in such a way that the distribution induced on the model’s proportion of variance explained matches the prior distribution. We show empirically that the new prior improves prediction accuracy compared to existing neural network priors on publicly available datasets and in a genetics application where signals are weak and sparse, often outperforming even computationally intensive cross-validation for hyperparameter tuning.

Highlights

Neural networks (NNs) have achieved state-of-the-art performance on a wide range of supervised learning tasks with high a signal-to-noise ratio (S/N), such as computer vision (Krizhevsky et al, 2012) and natural language processing (Devlin et al, 2018)
In the Supplementary, we develop a novel Monte Carlo approach to model the log-linear relationship between the global scale of the Mean-Field Gaussian prior and the prediction variance of the Bayesian neural network (BNN) to avoid computationally expensive grid search, and we use this to set the variance according to a point estimate of the proportion of variance explained (PVE), but we find that the resulting nonhierarchical Gaussian prior is not flexible enough
The true PVE is unavailable as a prior knowledge, a less informative prior over PVE (e.g., U [0, 1] in HMF+PVE) provides sufficient probability density on the true PVE compared with HMF, whose induced prior PVE is highly concentrated on 1 and gives almost 0 probability density on the true PVE

Summary

Introduction

Neural networks (NNs) have achieved state-of-the-art performance on a wide range of supervised learning tasks with high a signal-to-noise ratio (S/N), such as computer vision (Krizhevsky et al, 2012) and natural language processing (Devlin et al, 2018). Question how to encode domain knowledge into the prior over Bayesian neural network (BNN) weights, which are often high-dimensional and uninterpretable. We propose determining the hyper-priors according to two types of domain knowledge often available in scientific applications: ballpark figures on feature sparsity and the signal-to-noise ratio. We propose a novel informative hyper-prior over the feature inclusion indicators τi(l), called informative spike-and-slab, which can directly model any distribution on the number of relevant features (Figure 1a). The distribution of PVE assumed by a BNN is induced by the prior on the model’s weights, which in turn is affected by all the hyper-parameters. Hyper-parameters that do not affect feature sparsity, e.g. λ(il), can be used to encode domain knowledge about the PVE.

Proportion of Variance Explained

Bayesian neural networks

Stein Gradient Estimator

Prior knowledge about sparsity

Prior on the number of relevant features

Feature allocation

Prior knowledge on the PVE

PVE for Bayesian neural networks

Optimizing hyper-parameters according to prior PVE

Learning BNNs with variational inference

Related literature

Experiments

Synthetic data

Results

Public real-world UCI datasets

Web traffic time series prediction

Metabolite prediction using genetic data

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bayesian Analysis	Publication Date: Dec 1, 2022
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

Informative Bayesian Neural Network Priors for Weak Signals

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bayesian Analysis

Lead the way for us

Similar Papers

Partial Logistic Artificial Neural Network with automatic relevance determination and Markov Chain Monte Carlo methods applied in medical survival studies
Corneliu Arsene
-
Corneliu ArseneCorneliu Arsene
01 Oct 2016
01 Oct 2016

Local Standard Deviation Spectral Clustering
Juanying Xie ... Ying Zhou
-
Juanying Xie, et. al.Juanying Xie ... Ying Zhou
01 Jan 2018
01 Jan 2018

Bayesian Neural Network with and without compensation for competing risks
Corneliu T.C. Arsene ... Paulo J. Lisboa
-
Corneliu T.C. Arsene, et. al.Corneliu T.C. Arsene ... Paulo J. Lisboa
01 Jun 2012
01 Jun 2012

Joint feature selection and classification using a Bayesian neural network with automatic relevance determination priors: potential use in CAD of medical imaging
Weijie Chen ... Maryellen L Giger
-
Weijie Chen, et. al.Weijie Chen ... Maryellen L Giger
08 Mar 2007
08 Mar 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Informative Bayesian Neural Network Priors for Weak Signals

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bayesian Analysis