Abstract Artificial neural networks (ANN) are a type of machine learning model that has been applied to various genomic problems, with the ability to learn non-linear relationships and model high-dimensional data. ANNs also have the potential in genomic prediction by capturing the intricate relationship between genetic variants and phenotypes. However, there is currently a limited effort to investigate the performance and feasibility of ANNs for pig genomic predictions. In this study, we evaluated the predictive performance of TensorFlow’s ANN models with one-layer, two-layer, and three-layer structures (with zero, one, and two hidden layers, respectively), in comparison with five linear methods, including GBLUP, LDAK, BayesR, SLEMM and scikit-learn’s ridge regression using data of six quantitative traits including off-test body weight (WT), off-test back fat thickness (BF), off-test loin muscle depth (MS), number of piglets born alive (NBA), number of piglets born dead (NBD), and number of piglets weaned (NW). Furthermore, we assessed the computational efficiency of ANNs on both CPU and GPU. The benchmarking was based on cross-validations of 26,190 genotyped pigs. We employed hyperband tuning to optimize the hyper-parameters and select the best model among one-layer, two-layer, and three-layer structures. Results showed that the one-layer structure, which is equivalent to ridge regression, yielded the best performance comparable to that of GBLUP. Using the optimal hyper-parameters for two-layer and three-layer structures, ANNs underperformed GBLUP in terms of accuracy. Of the five linear methods, BayesR and SLEMM performed similarly and the best, followed by LDAK, scikit-learn’s ridge regression, and GBLUP. Moreover, SLEMM was the fastest, which completed training with 21k individuals and 30k SNPs in 2.6 minutes. Compared with CPUs, GPUs exhibited a comparable computational speed for one-layer ANN but offered significant gains in computational efficiency for multi-layer ANNs. Based on our analysis of optimal hyper-parameters for two-layer ANN with BF, we found that using a GPU can lead to a five-fold increase in processing speed compared with using a conventional CPU, but it is still slower than GBLUP. In addition, hyper-parameter tuning (particularly for L2 regularization and the number of dense units in hidden layers) is critical for improving the genomic prediction accuracy in pigs. In conclusion, we found ANN with up to three layers could not improve genomic predictions compared with routine linear methods for pig quantitative traits.
Read full abstract