Genomic Prediction Accuracy of Seven Breeding Selection Traits Improved by QTL Identification in Flax.

Sylvie Cloutier,Chunfang Zheng,Frank M You,Samuel Lan,Madison Mccausland,Scott D Duguid,Helen M Booker,Kyle Hauck

doi:10.3390/ijms21051577

Abstract

Molecular markers are one of the major factors affecting genomic prediction accuracy and the cost of genomic selection (GS). Previous studies have indicated that the use of quantitative trait loci (QTL) as markers in GS significantly increases prediction accuracy compared with genome-wide random single nucleotide polymorphism (SNP) markers. To optimize the selection of QTL markers in GS, a set of 260 lines from bi-parental populations with 17,277 genome-wide SNPs were used to evaluate the prediction accuracy for seed yield (YLD), days to maturity (DTM), iodine value (IOD), protein (PRO), oil (OIL), linoleic acid (LIO), and linolenic acid (LIN) contents. These seven traits were phenotyped over four years at two locations. Identification of quantitative trait nucleotides (QTNs) for the seven traits was performed using three types of statistical models for genome-wide association study: two SNP-based single-locus (SS), seven SNP-based multi-locus (SM), and one haplotype-block-based multi-locus (BM) models. The identified QTNs were then grouped into QTL based on haplotype blocks. For all seven traits, 133, 355, and 1208 unique QTL were identified by SS, SM, and BM, respectively. A total of 1420 unique QTL were obtained by SS+SM+BM, ranging from 254 (OIL, LIO) to 361 (YLD) for individual traits, whereas a total of 427 unique QTL were achieved by SS+SM, ranging from 56 (YLD) to 128 (LIO). SS models alone did not identify sufficient QTL for GS. The highest prediction accuracies were obtained using single-trait QTL identified by SS+SM+BM for OIL (0.929 ± 0.016), PRO (0.893 ± 0.023), YLD (0.892 ± 0.030), and DTM (0.730 ± 0.062), and by SS+SM for LIN (0.837 ± 0.053), LIO (0.835 ± 0.049), and IOD (0.835 ± 0.041). In terms of the number of QTL markers and prediction accuracy, SS+SM outperformed other models or combinations thereof. The use of all SNPs or QTL of all seven traits significantly reduced the prediction accuracy of traits. The results further validated that QTL outperformed high-density genome-wide random markers, and demonstrated that the combined use of single and multi-locus models can effectively identify a comprehensive set of QTL that improve prediction accuracy, but further studies on detection and removal of redundant or false-positive QTL to maximize prediction accuracy and minimize the number of QTL markers in GS are warranted.

Highlights

Genomic selection (GS) is a form of marker-assisted selection (MAS) that predicts genomic estimated breeding values (GEBVs) of test individuals through the use of genome-wide markers [1,2]
Seven breeding selection traits in flax, namely, seed yield (YLD), days to maturity (DTM), iodine value (IOD), protein content (PRO), oil content (OIL), linoleic acid content (LIO) and linolenic acid content (LIN) were measured from 260 lines from bi-parental populations grown in the field for four years at two locations (Figure 1)
We adopted a set of genomic and phenotypic data, including 260 lines derived from bi-parental populations, 17,277 genome-wide random single nucleotide polymorphism (SNP), and phenotypes of seven major breeding selection traits in flax, which were evaluated in four years and two locations, to find optimal markers for maximizing prediction accuracy and minimizing cost of genotyping in breeding selection for these important traits

Summary

Introduction

Genomic selection (GS) is a form of marker-assisted selection (MAS) that predicts genomic estimated breeding values (GEBVs) of test individuals through the use of genome-wide markers [1,2]. Using QTL associated with traits of interest, instead of using a full set of random SNPs in a GS model, greatly reduces the number of markers, which in turns reduces the cost of genotyping large breeding populations. Our previous study on pasmo resistance in flax has showed that using 500 QTL identified through single-locus and multi-locus genome-wide association study (GWAS) models [9] from a flax core collection (a germplasm population) [10,11] was highly effective for GS and generated a prediction accuracy as high as 0.92 compared with 0.67 when using 52,347 random SNPs [5]

Objectives

Methods

Results

Discussion

Conclusion