Multi-omics prediction of oat agronomic and seed nutritional traits across environments and in distantly related populations

Haixiao Hu,Jean-Luc Jannink,Trevor H Yeats,Giovanny Covarrubias-Pazaran,Daniel E Runcie,Mark E Sorrells,James Tanaka,Lucı́A Gutiérrez,Owen A Hoekenga,Xuying Zheng,Malachy T Campbell,Corey Broeckling,Michael A Gore,Melanie Caffe-Treml,Linxing Yao,Kevin P Smith

doi:10.1007/s00122-021-03946-4

Abstract

Key message Integration of multi-omics data improved prediction accuracies of oat agronomic and seed nutritional traits in multi-environment trials and distantly related populations in addition to the single-environment prediction.Multi-omics prediction has been shown to be superior to genomic prediction with genome-wide DNA-based genetic markers (G) for predicting phenotypes. However, most of the existing studies were based on historical datasets from one environment; therefore, they were unable to evaluate the efficiency of multi-omics prediction in multi-environment trials and distantly related populations. To fill those gaps, we designed a systematic experiment to collect omics data and evaluate 17 traits in two oat breeding populations planted in single and multiple environments. In the single-environment trial, transcriptomic BLUP (T), metabolomic BLUP (M), G + T, G + M, and G + T + M models showed greater prediction accuracy than GBLUP for 5, 10, 11, 17, and 17 traits, respectively, and metabolites generally performed better than transcripts when combined with SNPs. In the multi-environment trial, multi-trait models with omics data outperformed both counterpart multi-trait GBLUP models and single-environment omics models, and the highest prediction accuracy was achieved when modeling genetic covariance as an unstructured covariance model. We also demonstrated that omics data can be used to prioritize loci from one population with omics data to improve genomic prediction in a distantly related population using a two-kernel linear model that accommodated both likely casual loci with large-effect and loci that explain little or no phenotypic variance. We propose that the two-kernel linear model is superior to most genomic prediction models that assume each variant is equally likely to affect the trait and can be used to improve prediction accuracy for any trait with prior knowledge of genetic architecture.

Highlights

Oat (Avena sativa L.) ranks sixth in world cereal production and has increasingly been consumed as a human food (USDA 2019)
Percent change in prediction accuracy over GBLUP ranged from 0.1% (Days to Heading, G + T model) to 70.3% (C18:0, G + M model) with a median of 21.5%, and most of differences in prediction accuracy between omics models and GBLUP are statistically significant
The question that we explored was whether multi-omics models (M and G + M) could improve prediction accuracy compared to corresponding multi-trait models based on SNPs alone (G model)

Summary

Introduction

Oat (Avena sativa L.) ranks sixth in world cereal production and has increasingly been consumed as a human food (USDA 2019). Recent advances in high-throughput sequencing and metabolite profiling technologies enable quantification of gene expression and metabolite abundance for hundreds of samples with high precision and reasonable cost (Alseekh and Fernie 2018; Moll et al 2014) All these advances in technology provides an opportunity to integrate different omics data and improve predictions for phenotypes of interest. Xu et al (2017) and Wang et al (2019) suggested that best linear unbiased prediction was the most efficient method compared to other commonly used genomic prediction and non-linear machine learning methods Most of those studies were based on historical datasets with a limited number of metabolite features and each level of omics data was collected from different projects. Prediction of breeding values of distantly related individuals are needed in many and perhaps the most promising applications of genomic selection in both plant and animal breeding programs (Lorenz and Smith 2015; Meuwissen 2009; Moghaddar et al 2019)

Methods

Results

Conclusion