Abstract The use of R-squared as a measurement of prediction in regression is a common practice in animal nutrition studies. Generally, studies use data used to estimate the prediction equation to calculate the R-squared of the equation, which does not represent prediction. The objective was to evaluate the predictive ability of prediction equations using independent datasets. Data were simulated in based on previously reported prediction equations for ADG in nursery pigs: 152 + 195(SID_Lysine%). A completely randomized design with SID_Lysine% levels of 1.2, 1.3, and 1.4% was used. Data were simulated for three CVs representing low, moderate, and high variation in ADG, for low number of replicates per treatment (RepTreat; 4 to 10, every 2). Data were simulated and analyses performed 300 times for each scenario. Prediction equations were estimated for each scenario when significant effect of SID_Lysine% was followed by a significant orthogonal contrast testing the linear effect of SID_Lysine% (P-value< 0.05), and their R-squared were calculated (R2_training). The prediction equation was used to test the fit of the other datasets, and their R-squared were calculated and averaged for each prediction equation (R2_prediction). The relationship between R2_training and R2_prediction was assessed through Spearmen’s correlation and regression of R2_prediction on R2_training, across scenarios. The R2_training and R2_prediction averages were similar with 94.5% and 93.9%, 66.0% and 63.5%, and 46.1% and 42.7%, for low, moderate, and high CVs, respectively. The rank-correlation between R2_training and R2_prediction increased with CV: 0.23, 0.67, and 0.77, for low, moderate, and high CV, respectively. From regression analysis of R2_prediction on R2_training, as CV increased bias increased, with prediction equations with greater R2_training overestimating R2_prediction, and prediction equations with lower R2_training underestimating R2_prediction. These results indicate that in small sample sizes (≤10 per treatment), as CV increases, models with high R-squared are overestimated, and are not as predictable.
Read full abstract