Approximate Bayesian neural networks in genomic prediction

Patrik Waldmann

doi:10.1186/s12711-018-0439-1

Abstract

BackgroundGenome-wide marker data are used both in phenotypic genome-wide association studies (GWAS) and genome-wide prediction (GWP). Typically, such studies include high-dimensional data with thousands to millions of single nucleotide polymorphisms (SNPs) recorded in hundreds to a few thousands individuals. Different machine-learning approaches have been used in GWAS and GWP effectively, but the use of neural networks (NN) and deep-learning is still scarce. This study presents a NN model for genomic SNP data.ResultsWe show, using both simulated and real pig data, that regularization is obtained using weight decay and dropout, and results in an approximate Bayesian (ABNN) model that can be used to obtain model averaged posterior predictions. The ABNN model is implemented in mxnet and shown to yield better prediction accuracy than genomic best linear unbiased prediction and Bayesian LASSO. The mean squared error was reduced by at least 6.5% in the simulated data and by at least 1% in the real data. Moreover, by comparing NN of different complexities, our results confirm that a shallow model with one layer, one neuron, one-hot encoding and a linear activation function performs better than more complex models.ConclusionsThe ABNN model provides a computationally efficient approach with good prediction performance and in which the weight components can also provide information on the importance of the SNPs. Hence, ABNN is suitable for both GWP and GWAS.

Highlights

Genome-wide marker data are used both in phenotypic genome-wide association studies (GWAS) and genome-wide prediction (GWP)
Simulated data The Monte Carlo Markov chains of the genomic best linear unbiased prediction (GBLUP) and Bayesian LASSO (BLASSO) analyses were run for 60,000 iterations, and a burn-in of 10,000 and thinning of 10 resulted in a final sample of 5000 iterations
We showed that regularization of genome-wide data can be obtained by a combination of weight decay and dropout in neural networks (NN), and that this provides an efficient method, which can be used both for GWP and GWAS

Summary

Introduction

Genome-wide marker data are used both in phenotypic genome-wide association studies (GWAS) and genome-wide prediction (GWP). Such studies include high-dimensional data with thousands to millions of single nucleotide polymorphisms (SNPs) recorded in hundreds to a few thousands individuals. The concept of genome-wide prediction (GWP) was introduced by Meuwissen et al [5] and refers to the idea that regression coefficients of genomic markers, often single-nucleotide polymorphisms (SNPs), can be used to predict phenotypes of individuals. Among the most flexible methods in machine-learning are deep artificial neural networks, which have recently received large attention because of their outstanding prediction properties [9].

Objectives

Methods

Results

Discussion

Conclusion