Abstract

Improving the prediction accuracy of a complex trait of interest is key to performing genomic selection (GS) for crop breeding. For the complex trait measured in multiple environments, this paper proposes a two-stage method to solve a linear model that jointly models the genetic effects and the genotype × environment interaction (G × E) effects. In the first stage, the least absolute shrinkage and selection operator (LASSO) penalized method was utilized to identify quantitative trait loci (QTL). Then, the ordinary least squares (OLS) approach was used in the second stage to reestimate the QTL effects. As a case study, this approach was used to improve the prediction accuracies of flowering time (FT), oil content (OC), and seed yield per plant (SY) inBrassica napus(B. napus). The results showed that theG × Eeffects reduced the mean squared error (MSE) significantly. Numerous QTL were environment-specific and presented minor effects. On average, the two-stage method, named OLS post-LASSO, offers the highest prediction accuracies (correlations are 0.8789, 0.9045, and 0.5507 for FT, OC, and SY, respectively). It was followed by the marker × environment interaction (M × E) genomic best linear unbiased prediction (GBLUP) model (correlations are 0.8347, 0.8205, and 0.4005 for FT, OC, and SY, respectively), the LASSO method (correlations are 0.7583, 0.7755, and 0.2718 for FT, OC, and SY, respectively), and the stratified GBLUP model (correlations are 0.6789, 0.6361, and 0.2860 for FT, OC, and SY, respectively). The two-stage method showed an obvious improvement in the prediction accuracy, and this study will provide methods and reference to improve GS of breeding.

Highlights

  • In the last three decades, the development of molecular marker technology has provided numerous molecular markers for the most important species [1]

  • Accumulating studies showed that incorporating G × E effects into the genomic selection (GS) model could substantially increase the prediction accuracy of the complex trait. erefore, in this study, based on the representative TNDH population, we will evaluate the performance of a two-stage approach via a linear model that jointly models the genetic effects and G × E effects. e objective of the present study is to improve the prediction accuracy of complex traits for B. napus

  • Details of phenotypic and genotypic data and how the TNDH population was developed can be found in Luo et al [27]. ese 182 TNDH lines, the 2041 markers, and the phenotypic data for three complex traits (SY, oil content (OC), and flowering time (FT)) across all the ten environments were used in the present study

Read more

Summary

Introduction

In the last three decades, the development of molecular marker technology has provided numerous molecular markers for the most important species [1]. Regarding the use of molecular markers in the selection of a genetic trait, marker-assisted selection (MAS) [2] became a valuable tool in animal and plant breeding in the 1990s and works well for traits with a simple genetic architecture. With high-density molecular markers, the number of markers (p) can vastly exceed the sample size (n), which is referred to as a “large p small n” problem. To deal with the “large p small n” problem, one can impose some constraints on the linear model, resulting in penalized estimation methods, such as ridge regression (RR) [5] and LASSO [6]. RR performs parameter shrinkage only, while LASSO offers both parameter shrinkage and variable selection simultaneously.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call