Comparing Genomic Prediction Models by Means of Cross Validation.

Matías F Schrauf,Sebastián Munilla,Gustavo De Los Campos

doi:10.3389/fpls.2021.734512

Abstract

In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called “hyper-parameters”). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders.

Highlights

In essence, genomic models relate genotypic variation as present in dense marker panels to phenotypic variation in a given population
Genomic models relate genotypic variation as present in dense marker panels to phenotypic variation in a given population. These models were first introduced in breeding (Meuwissen et al, 2001) as a change of paradigm with respect to traditional marker assisted selection. They are currently used to accelerate genetic gain in many plant breeding programs with the focus placed on improving predictive ability while remaining agnostic to the causative nature of the genotype-phenotype relation
When the focus is placed on predictions, as it is usual with genomic models, most of these decisions are made in a direction sought to optimize predictive accuracy

Summary

INTRODUCTION

Genomic models relate genotypic variation as present in dense marker panels to phenotypic variation in a given population. The present work illustrates how the different performance assessments and comparisons can be made with cross validations, with a focus placed on both identifying differences of practical relevance and the decision making required for model selection and hyper-parameter tuning. We emphasize the importance of conducting paired cross validations to achieve higher statistical power, and propose the use of equivalence margins to identify the differences in accuracy which are relevant in practice With these goals in mind, the present work is organized as follows: we first assess the predictive ability of G-BLUP (VanRaden, 2008), probably the most known genomic model, in a well studied dataset, where we discuss the general aspects of cross validation. For these lines we used four contrasting traits: the germination count, the number of leaves, the days to tassel, and plant height

The Genomic Models

The Datasets

Cross Validations for Model

Paired Cross Validations for Model

Software

Model Predictive Ability Assessment

Model Selection

General Model Comparison

Final Remarks

DATA AVAILABILITY STATEMENT

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Plant Science	Publication Date: Nov 19, 2021
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Comparing Genomic Prediction Models by Means of Cross Validation.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Plant Science

Lead the way for us

Similar Papers

Super Learner
Mark J Van Der Laan ... Alan E Hubbard
Statistical Applications in Genetics and Molecular Biology | VOL. 6
Mark J Van Der Laan, et. al.Mark J Van Der Laan ... Alan E Hubbard
16 Jan 2007
Statistical Applications in Genetics and Molecular Biology | VOL. 6

Randomized Subensembles: An Approach to Reduce the Risk of Divergence in an Ensemble Kalman Filter Using Cross Validation
Jean-François Caron ... Pieter L Houtekamer
Weather and forecasting | VOL. 37
Jean-François Caron, et. al.Jean-François Caron ... Pieter L Houtekamer
01 Nov 2022
Weather and forecasting | VOL. 37

PBAP: a pipeline for file processing and quality control of pedigree data with dense genetic markers.
Alejandro Q Nato ... Ellen M Wijsman
Computer applications in the biosciences : CABIOS | VOL. 31
Alejandro Q Nato, et. al.Alejandro Q Nato ... Ellen M Wijsman
30 Jul 2015
Computer applications in the biosciences : CABIOS | VOL. 31

Super Learning with Repeated Cross Validation
Krzysztof Mnich ... Aneta Polewko-Klim
-
Krzysztof Mnich, et. al.Krzysztof Mnich ... Aneta Polewko-Klim
01 Nov 2020
01 Nov 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparing Genomic Prediction Models by Means of Cross Validation.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Plant Science