Abstract

Machine learning methods such as multilayer perceptrons (MLP) and Convolutional Neural Networks (CNN) have emerged as promising methods for genomic prediction (GP). In this context, we assess the performance of MLP and CNN on regression and classification tasks in a case study with maize hybrids. The genomic information was provided to the MLP as a relationship matrix and to the CNN as “genomic images.” In the regression task, the machine learning models were compared along with GBLUP. Under the classification task, MLP and CNN were compared. In this case, the traits (plant height and grain yield) were discretized in such a way to create balanced (moderate selection intensity) and unbalanced (extreme selection intensity) datasets for further evaluations. An automatic hyperparameter search for MLP and CNN was performed, and the best models were reported. For both task types, several metrics were calculated under a validation scheme to assess the effect of the prediction method and other variables. Overall, MLP and CNN presented competitive results to GBLUP. Also, we bring new insights on automated machine learning for genomic prediction and its implications to plant breeding.

Highlights

  • Genomic prediction (GP) arose as a breeding tool capable of enabling a considerable increase in the rates of genetic gain

  • Several statistical machine learning methods have been adopted for genomic prediction (GP) because they can help improve genome-enabled prediction accuracy since they are able to make computers learn models or patterns that could be used for analysis, interpretation, prediction, and decision-making

  • One reason why so many types of statistical machine learning methods have been implemented in GP is that no universal best prediction model can be used under all circumstances

Read more

Summary

Introduction

Genomic prediction (GP) arose as a breeding tool capable of enabling a considerable increase in the rates of genetic gain In this context, three decades of scientific research have shown that the accuracy of this statistical approach might be conditioned to a series of factors, including the quality and pre-processing of the phenotypic data (Galli et al, 2018), the platform used to obtain genomic information and how it is processed (Granato et al, 2018; Sousa et al, 2019), the population mating design (Fritsche-Neto et al, 2018), the intrinsic genetic architecture of the trait (Alves et al, 2019), the genetic structure of the population (Lyra et al, 2018), how the genotype-by-environment interaction is dealt with One reason why so many types of statistical machine learning methods have been implemented in GP is that no universal best prediction model can be used under all circumstances

Objectives
Methods
Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call