Abstract Validation by data truncation is a common practice in genetic evaluations because of the interest in predicting the genetic merit of a set of young selection candidates. Two of the most used validation methods in genetic evaluations are predictivity (correlation between pre-adjusted phenotypes and EBV) divided by the square root of the heritability) and the linear regression (LR) method. Both methods compare predictions with the whole and partial datasets obtained by removing the information related to a set of validation individuals. Confidence intervals (CI) for predictivity and the LR method can be obtained by k-fold validation or bootstrapping. Analytical or frequentist CI are unavailable for predictivity and the LR method and would be beneficial to avoid running several validations. The analytical CI can also help test the quality of bootstrap intervals. This study aimed to derive analytical CI for predictivity, and statistics included in the LR method (bias, dispersion, ratio of accuracies, and reliability). The CI for the bias, dispersion, and reliability depends on the (co)variances of the EBV across the individuals in the validation set. The CI for the ratio of accuracies and predictivity were obtained through the Fisher transformation. We showed the adequacy of the analytical CI using simulation. The analytical CI were closer to the simulated ones. Bootstrap CI tend to be narrower than the simulated ones. Estimating the sampling variation of predictivity and the statistics in the LR method without replication or bootstrap is possible for any dataset with the method proposed in this study. All the formulas derived in this study were implemented in a new program belonging to the BLUPF90 suite called Validationf90.
Read full abstract