Genome-Wide Prediction of Complex Traits in Two Outcrossing Plant Species Through Deep Learning and Bayesian Regularized Neural Network.

Carlos Maldonado,Rodrigo Iván Contreras-Soto,Freddy Mora-Poblete,Sunny Ahmar,Carlos Alberto Scapim,Antônio Teixeira Do Amaral Júnior,Jen-Tsung Chen

doi:10.3389/fpls.2020.593897

Abstract

Genomic selection models were investigated to predict several complex traits in breeding populations of Zea mays L. and Eucalyptus globulus Labill. For this, the following methods of Machine Learning (ML) were implemented: (i) Deep Learning (DL) and (ii) Bayesian Regularized Neural Network (BRNN) both in combination with different hyperparameters. These ML methods were also compared with Genomic Best Linear Unbiased Prediction (GBLUP) and different Bayesian regression models [Bayes A, Bayes B, Bayes Cπ, Bayesian Ridge Regression, Bayesian LASSO, and Reproducing Kernel Hilbert Space (RKHS)]. DL models, using Rectified Linear Units (as the activation function), had higher predictive ability values, which varied from 0.27 (pilodyn penetration of 6 years old eucalypt trees) to 0.78 (flowering-related traits of maize). Moreover, the larger mini-batch size (100%) had a significantly higher predictive ability for wood-related traits than the smaller mini-batch size (10%). On the other hand, in the BRNN method, the architectures of one and two layers that used only the pureline function showed better results of prediction, with values ranging from 0.21 (pilodyn penetration) to 0.71 (flowering traits). A significant increase in the prediction ability was observed for DL in comparison with other methods of genomic prediction (Bayesian alphabet models, GBLUP, RKHS, and BRNN). Another important finding was the usefulness of DL models (through an iterative algorithm) as an SNP detection strategy for genome-wide association studies. The results of this study confirm the importance of DL for genome-wide analyses and crop/tree improvement strategies, which holds promise for accelerating breeding progress.

Highlights

Predictive ability values followed by a common letter are not significantly different according to the Tukey–Kramer test at a level of significance of 0.01
The results of this study showed that the Deep Learning (DL) model had a higher predictive ability (PA) than Genomic Best Linear Unbiased Prediction (GBLUP), linear (Bayes A, Bayes B, Bayes Cπ, Bayesian Ridge Regression (BRR), and Bayesian Lasso (BL)) and non-linear (RKHS and Bayesian Regularized Neural Network (BRNN)) Bayesian regression models in the prediction of several complex traits in both breeding populations
The results of this study suggested that architectures with the activation function Rectified Linear Units (ReLU) and a mini-batch of large size were the most optimal for the genomic prediction of complex traits in maize and eucalypt

Summary

Introduction

Artificial neural networks (ANNs) are computational methods of interest in the area of Machine Learning (ML) research, which has proved to be a powerful tool in several studies of genomic prediction (Drummond et al, 2003; Gianola et al, 2011; González-Recio and Forni, 2011; González-Recio et al, 2014; Leung et al, 2015; Glória et al, 2016; Romagnoni et al, 2019; Yin et al, 2019; Grinberg et al, 2020), due to its ability of dealing with a wide variety of high-dimensional problems in a computationally flexible manner (GonzálezRecio et al, 2014; Ranganathan et al, 2018) In this regard, Gianola et al (2011) pointed out that this method may be useful for the prediction of complex traits when the number of unknown variables is much larger than the number of samples (high-dimensional genomic information), since ANNs have the ability to capture non-linearities, adaptively (Gianola et al, 2011). The use of non-linear functions is a powerful alternative to linear regression because it offers the most flexible curve-fitting functionality, which seeks to minimize the standard error of the estimate to increase the prediction accuracy (Abebe et al, 2018)

Methods

Discussion

Conclusion