Deep learning methods improve genomic prediction of wheat breeding.

Abelardo Montesinos-López,Susanna Dreisigacker,Paolo Vitale,Guillermo Gerard,Huihui Li,Sofía Ramos-Pulido,Jose Crossa,Morten Lillemo,Govidan Velu,Moisés Chavira Flores,Osval A Montesinos-López,Leonardo Crespo-Herrera,Carolina Saint Pierre,Zerihun Tadesse Tarekegn,Paulino Pérez-Rodríguez

doi:10.3389/fpls.2024.1324090

Abstract

In the field of plant breeding, various machine learning models have been developed and studied to evaluate the genomic prediction (GP) accuracy of unseen phenotypes. Deep learning has shown promise. However, most studies on deep learning in plant breeding have been limited to small datasets, and only a few have explored its application in moderate-sized datasets. In this study, we aimed to address this limitation by utilizing a moderately large dataset. We examined the performance of a deep learning (DL) model and compared it with the widely used and powerful best linear unbiased prediction (GBLUP) model. The goal was to assess the GP accuracy in the context of a five-fold cross-validation strategy and when predicting complete environments using the DL model. The results revealed the DL model outperformed the GBLUP model in terms of GP accuracy for two out of the five included traits in the five-fold cross-validation strategy, with similar results in the other traits. This indicates the superiority of the DL model in predicting these specific traits. Furthermore, when predicting complete environments using the leave-one-environment-out (LOEO) approach, the DL model demonstrated competitive performance. It is worth noting that the DL model employed in this study extends a previously proposed multi-modal DL model, which had been primarily applied to image data but with small datasets. By utilizing a moderately large dataset, we were able to evaluate the performance and potential of the DL model in a context with more information and challenging scenario in plant breeding.

Full Text