Classification and Regression Models for Genomic Selection of Skewed Phenotypes: A Case for Disease Resistance in Winter Wheat (Triticum aestivum L.)

Lance F Merrick,Xianming Chen,Arron H Carter,Dennis N Lozada

doi:10.3389/fgene.2022.835781

Lance F Merrick, Xianming Chen + Show 2 more

Open Access

PDF Available

https://doi.org/10.3389/fgene.2022.835781

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Most genomic prediction models are linear regression models that assume continuous and normally distributed phenotypes, but responses to diseases such as stripe rust (caused by Puccinia striiformis f. sp. tritici) are commonly recorded in ordinal scales and percentages. Disease severity (SEV) and infection type (IT) data in germplasm screening nurseries generally do not follow these assumptions. On this regard, researchers may ignore the lack of normality, transform the phenotypes, use generalized linear models, or use supervised learning algorithms and classification models with no restriction on the distribution of response variables, which are less sensitive when modeling ordinal scores. The goal of this research was to compare classification and regression genomic selection models for skewed phenotypes using stripe rust SEV and IT in winter wheat. We extensively compared both regression and classification prediction models using two training populations composed of breeding lines phenotyped in 4 years (2016–2018 and 2020) and a diversity panel phenotyped in 4 years (2013–2016). The prediction models used 19,861 genotyping-by-sequencing single-nucleotide polymorphism markers. Overall, square root transformed phenotypes using ridge regression best linear unbiased prediction and support vector machine regression models displayed the highest combination of accuracy and relative efficiency across the regression and classification models. Furthermore, a classification system based on support vector machine and ordinal Bayesian models with a 2-Class scale for SEV reached the highest class accuracy of 0.99. This study showed that breeders can use linear and non-parametric regression models within their own breeding lines over combined years to accurately predict skewed phenotypes.

Highlights

Genomic selection (GS) is posed to increase genetic gain and reduce cycle time for complex agronomic traits that are difficult to phenotype and analyze (Meuwissen et al, 2001)
The means of the diverse association mapping panel (DP) were higher than the breeding lines (BL) trials, with lower coefficients of variation (CV)
The varying results for the classification and transformation methods displayed the need to choose the prediction model carefully based on the phenotype distribution

Summary

Introduction

Genomic selection (GS) is posed to increase genetic gain and reduce cycle time for complex agronomic traits that are difficult to phenotype and analyze (Meuwissen et al, 2001). With the advent of high-throughput genotyping, it is feasible to develop and implement GS models for categorical/ordinal phenotypes that are common in most breeding programs and often difficult to analyze. Most GS models are linear regression models that assume continuous and normally distributed phenotypes (MontesinosLópez et al, 2015c). When faced with data that do not follow the assumption of a linear model, researchers have several options. They may either ignore the lack of normality, transform the phenotypes, use generalized linear models (GLMs), or use machine learning (ML) algorithms and classification models. Most GS models treat disease resistance as continuous values and utilize regression models and transformations for prediction whereas only a few studies have used classification methods (Ornella et al, 2012; Ornella et al, 2014; Rutkoski et al, 2014; Muleta et al, 2017)

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Genetics	Publication Date: Feb 23, 2022
Citations: 8	License type: CC BY 4.0

R Discovery Prime

Classification and Regression Models for Genomic Selection of Skewed Phenotypes: A Case for Disease Resistance in Winter Wheat (Triticum aestivum L.)

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Frontiers in Genetics

Lead the way for us

Similar Papers

Short-term Lake Erie algal bloom prediction by classification and regression models
Haiping Ai ... Huichun Zhang
Water Research | VOL. 232
Haiping Ai, et. al.Haiping Ai ... Huichun Zhang
05 Feb 2023
Water Research | VOL. 232

Prediction of fecal coliform using logistic regression and tree-based classification models in the North Han River, South Korea
Soo Yeon Choi ... Il Won Seo
Journal of Hydro-environment Research | VOL. 21
Soo Yeon Choi, et. al.Soo Yeon Choi ... Il Won Seo
17 Sep 2018
Journal of Hydro-environment Research | VOL. 21

Development of classification and regression models for Vibrio fischeri toxicity of ionic liquids: green solvents for the future
Rudra Narayan Das ... Kunal Roy
Toxicology Research | VOL. 1
Rudra Narayan Das, et. al.Rudra Narayan Das ... Kunal Roy
01 Jan 2012
Toxicology Research | VOL. 1

Pattern-Aided Regression Modeling and Prediction Model Analysis
Guozhu Dong ... Vahid Taslimitehrani
IEEE Transactions on Knowledge and Data Engineering | VOL. 27
Guozhu Dong, et. al.Guozhu Dong ... Vahid Taslimitehrani
08 Jul 2015
IEEE Transactions on Knowledge and Data Engineering | VOL. 27

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Classification and Regression Models for Genomic Selection of Skewed Phenotypes: A Case for Disease Resistance in Winter Wheat (Triticum aestivum L.)

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Frontiers in Genetics