Abstract The single-step method (ssGBLUP) allows for simultaneous evaluations of populations composed of genotyped and non-genotyped animals. As a GBLUP-based method, the output of ssGBLUP is breeding values. However, interest might rely on the estimation of SNP effects and on how genome segments are associated with the phenotype of interest. When that is the case, SNP effects can be obtained from a linear transformation of breeding values, and P-values can be used as a measure of estimation certainty. P-values are derived from the prediction error variance of SNP effects, which relies on the prediction error (co)variance matrix of breeding values obtained from the inverse of the left-hand side of the mixed model equations (LHS). However, inverting the LHS becomes unfeasible with large, genotyped populations. Therefore, alternative strategies to overcome this problem and obtain p-values with increasing genotyped populations might be evaluated. In this study, we aimed to estimate marker effect p-values when the algorithm for proven and young (APY) was used to approximate the predictor error (co)variance for animals in a large, genotyped population. We estimate P-values of SNP marker effects for PWG in an Angus cattle population. The dataset was composed of around 844K phenotyped animals, 450K genotypes, and 1.8M pedigree records. Analyses were split into two sets. In the first set, a reduced genotype data of 50K was used, so p-values obtained from a direct-inverse of LHS using the regular G-1 (Exact_G) or APY G-1 (Exact_GAPY) were compared with APY G-1 and an approximation of the prediction error variance that did not require the inversion of the LHS (Approx_GAPY). In the second set of analyses, P-values with Approx_GAPY were estimated with the full genomic set of 450K (Approx_GAPY_450K), and computational requirements were recorded. All analyses were performed in three replicates. In the first set of analyses, genome-wide association with Exact_G uncovered two significant regions in chromosomes 7 and 20. Along all replicates, the same regions were also identified with Exact_GAPY and Approx_GAPY, indicating that similar resolutions are obtained with exact and approximated p-values. Results from Approx_GAPY_450K showed that, with a complete genome set, besides the two regions in chromosomes 7 and 20, two new regions in chromosomes 6 and 14 were identified. On average, the entire procedure for obtaining P-values with 450K genotyped animals had an elapsed wall clock time of 24h with a maximum memory usage of 87.6 GB. Altogether, our results suggest that with APY and the approximation of the prediction error variance, current computational boundaries for obtaining marker effect P-values for large genotyped populations should be lifted.
Read full abstract