Genomic Prediction Based on SNP Functional Annotation Using Imputed Whole-Genome Sequence Data in Korean Hanwoo Cattle.

Bryan Irvine M Lopez,Narae An,Jae-Don Oh,Jong-Eun Park,Dajeong Lim,Seunghwan Lee,Woncheoul Park,Han-Ha Chai,Dong-Hyun Shin,Krishnamoorthy Srikanth

doi:10.3389/fgene.2020.603822

Abstract

Whole-genome sequence (WGS) data are increasingly being applied into genomic predictions, offering a higher predictive ability by including causal mutations or single-nucleotide polymorphisms (SNPs) putatively in strong linkage disequilibrium with causal mutations affecting the trait. This study aimed to improve the predictive performance of the customized Hanwoo 50 k SNP panel for four carcass traits in commercial Hanwoo population by adding highly predictive variants from sequence data. A total of 16,892 Hanwoo cattle with phenotypes (i.e., backfat thickness, carcass weight, longissimus muscle area, and marbling score), 50 k genotypes, and WGS imputed genotypes were used. We partitioned imputed WGS data according to functional annotation [intergenic (IGR), intron (ITR), regulatory (REG), synonymous (SYN), and non-synonymous (NSY)] to characterize the genomic regions that will deliver higher predictive power for the traits investigated. Animals were assigned into two groups, the discovery set (7324 animals) used for predictive variant detection and the cross-validation set for genomic prediction. Genome-wide association studies were performed by trait to every genomic region and entire WGS data for the pre-selection of variants. Each set of pre-selected SNPs with different density (1000, 3000, 5000, or 10,000) were added to the 50 k genotypes separately and the predictive performance of each set of genotypes was assessed using the genomic best linear unbiased prediction (GBLUP). Results showed that the predictive performance of the customized Hanwoo 50 k SNP panel can be improved by the addition of pre-selected variants from the WGS data, particularly 3000 variants from each trait, which is then sufficient to improve the prediction accuracy for all traits. When 12,000 pre-selected variants (3000 variants from each trait) were added to the 50 k genotypes, the prediction accuracies increased by 9.9, 9.2, 6.4, and 4.7% for backfat thickness, carcass weight, longissimus muscle area, and marbling score compared to the regular 50 k SNP panel, respectively. In terms of prediction bias, regression coefficients for all sets of genotypes in all traits were close to 1, indicating an unbiased prediction. The strategy used to select variants based on functional annotation did not show a clear advantage compared to using whole-genome. Nonetheless, such pre-selected SNPs from the IGR region gave the highest improvement in prediction accuracy among genomic regions and the values were close to those obtained using the WGS data for all traits. We concluded that additional gain in prediction accuracy when using pre-selected variants appears to be trait-dependent, and using WGS data remained more accurate compared to using a specific genomic region.

Highlights

The use of whole-genome sequence (WGS) data in genomic prediction is expected to be advantageous, since all or most of the causal mutations or single-nucleotide polymorphisms (SNPs) are putatively in strong linkage disequilibrium (LD) with causal mutations affecting the traits
As an alternative to a simple increase in marker density, some studies suggested that the prediction accuracy could be improved by adding significant quantitative traits loci (QTL) or variants that were selected based on genome-wide association studies (GWAS) using WGS data (Brondum et al, 2015; Veerkamp et al, 2016; Moghaddar et al, 2019)
While we observed that the heritability estimates for BFT, longissimus muscle area (LMA), and marbling score (MS) showed no notable change when more pre-selected SNPs were added from the WGS, a small decrease was noted for carcass weight (CWT)

Summary

Introduction

The use of whole-genome sequence (WGS) data in genomic prediction is expected to be advantageous, since all or most of the causal mutations or single-nucleotide polymorphisms (SNPs) are putatively in strong linkage disequilibrium (LD) with causal mutations affecting the traits. This was confirmed in a simulation study (Meuwissen and Goddard, 2010), but in real data, the use of entirely WGS data was shown to lead to no or only small improvements in prediction accuracy. As an alternative to a simple increase in marker density, some studies suggested that the prediction accuracy could be improved by adding significant QTL or variants that were selected based on genome-wide association studies (GWAS) using WGS data (Brondum et al, 2015; Veerkamp et al, 2016; Moghaddar et al, 2019)

Objectives

Methods

Results

Conclusion