Nonparametric Method for Genomics-Based Prediction of Performance of Quantitative Traits Involving Epistasis in Plant Breeding

Xiaochun Sun,Rita H Mumm,Ping Ma

doi:10.1371/journal.pone.0050604

Xiaochun Sun, Rita H Mumm + Show 1 more

Open Access

https://doi.org/10.1371/journal.pone.0050604

Copy DOI

Journal: PLoS ONE	Publication Date: Nov 30, 2012
Citations: 16	License type: CC BY 4.0

Affiliation: University of Illinois Urbana-Champaign

Abstract

Genomic selection (GS) procedures have proven useful in estimating breeding value and predicting phenotype with genome-wide molecular marker information. However, issues of high dimensionality, multicollinearity, and the inability to deal effectively with epistasis can jeopardize accuracy and predictive ability. We, therefore, propose a new nonparametric method, pRKHS, which combines the features of supervised principal component analysis (SPCA) and reproducing kernel Hilbert spaces (RKHS) regression, with versions for traits with no/low epistasis, pRKHS-NE, to high epistasis, pRKHS-E. Instead of assigning a specific relationship to represent the underlying epistasis, the method maps genotype to phenotype in a nonparametric way, thus requiring fewer genetic assumptions. SPCA decreases the number of markers needed for prediction by filtering out low-signal markers with the optimal marker set determined by cross-validation. Principal components are computed from reduced marker matrix (called supervised principal components, SPC) and included in the smoothing spline ANOVA model as independent variables to fit the data. The new method was evaluated in comparison with current popular methods for practicing GS, specifically RR-BLUP, BayesA, BayesB, as well as a newer method by Crossa et al., RKHS-M, using both simulated and real data. Results demonstrate that pRKHS generally delivers greater predictive ability, particularly when epistasis impacts trait expression. Beyond prediction, the new method also facilitates inferences about the extent to which epistasis influences trait expression.

Highlights

The estimation of breeding values to facilitate choice of parents is a central problem in plant breeding
More markers were involved in grain yield (GY), i.e. 1000 and 900 markers for pRKHS-E and pRKHS-NE, respectively, suggesting that more genes and perhaps more epistasis was h2 = 0.1
This study demonstrates the advantages of using nonparametric methods to estimate breeding value and to predict phenotypic performance, especially for traits involving epistatic gene action

Summary

Introduction

The estimation of breeding values to facilitate choice of parents is a central problem in plant breeding. In terms of evaluating and identifying outstanding progeny, modern genotyping technologies make it possible to predict performance of new lines based on molecular marker or DNA sequence profile. Fernando and Grossman [1] first demonstrated the utility of molecular marker data to estimate breeding values in livestock species. These were data involving very few markers. Due to increasingly developed genotyping and sequencing technologies, densely spaced genome-wide SNP (single nucleotide polymorphism) data, involving tens or hundreds of thousands of markers, are available for a number of crops. The genome-wide markers can be used as ‘predictors’ to achieve high accuracy in estimating breeding values. Problems like high dimensionality and multicollinearity emerge when the number of predictors is very large and exceeds the number of records

Methods

Results

Conclusion