Sample size and Shapiro-Wilk test: An analysis for soybean grain yield

Rafael Rodrigues De Souza,Marcos Toebe,Anderson Chuquel Mello,Karina Chertok Bittencourt

doi:10.1016/j.eja.2022.126666

Abstract

Ignoring the violation of the error normality assumption is a common mistake that can cause inconsistencies in the results of the analysis of variance. To avoid this, normality tests such as the Shapiro-Wilk test can be used to verify whether this assumption is met or violated. However, the test accuracy relies on a representative number of observations sampled in trials, once the calculated W statistic used for the test is considered sensitive or capable of being altered according to the sample size. Therefore, the aims of this study were to analyze the Shapiro-Wilk test response to sample size for soybean grain yield and to define the sample size for optimizing the W statistic estimate, making it more efficient and reliable. For this, the replacement sampling method was applied in trials that measured soybean grain yield per plant. The trials were performed in two locations in the state of Rio Grande do Sul, Brazil, and in each, three sowing dates were assessed, using a total of 30 commercial soybean cultivars. Grain yield per plant was assessed by weighing grains at maturity and correcting the value to 13 % moisture. Posteriorly, thirty-one sample scenarios were simulated per experimental unit, performing an analysis of variance and applying the Shapiro-Wilk test at 5 % error probability to the experimental unit errors in each pre-defined sampling scenario. Also, four methodologies for determining the maximum curvature point to define sample size were compared, which are the general, spline, perpendicular distance, and linear plateau methods. The 95 % confidence interval width of the W statistic had an exponential decreasing response for trials that either meet or violate the error normality assumption. Small sample sizes per experimental unit led to biased estimates of the test, where it was either under or overestimated. The perpendicular distance and linear plateau methods were the most adequate to define the sufficient sample size for the W statistic, and at least 17 plants per experimental unit should be used for estimating error normality for soybean grain yield reliably.

Full Text