Abstract

The split-plot design arose from agricultural science with experimental units, also known as the subplots, nested within groups known as the whole plots. It assigns different interventions at the whole-plot and subplot levels, respectively, providing a convenient way to accommodate hard-to-change factors. By design, subplots within the same whole plot receive the same level of the whole-plot intervention, and thereby induce a group structure on the final treatment assignments. A common strategy is to run an ordinary least squares (ols) regression of the outcome on the treatment indicators coupled with the robust standard errors clustered at the whole-plot level. It does not give consistent estimators for the treatment effects of interest when the whole-plot sizes vary. Another common strategy is to fit a linear mixed-effects model of the outcome with normal random effects and errors. It is a purely model-based approach and can be sensitive to violations of the parametric assumptions. In contrast, design-based inference assumes no outcome models and relies solely on the controllable randomization mechanism determined by the physical experiment. We first extend the existing design-based inference based on the Horvitz–Thompson estimator to the Hajek estimator, and establish the finite-population central limit theorem for both under split-plot randomization. We then reconcile the results with those under the model-based approach, and propose two regression strategies, namely (i) the weighted least squares (wls) fit of the unit-level data based on the inverse probability weighting and (ii) the ols fit of the aggregate data based on whole-plot total outcomes, to reproduce the Hajek and Horvitz–Thompson estimators, respectively. This, together with the asymptotic conservativeness of the corresponding cluster-robust covariances for estimating the true design-based covariances as we establish in the process, justifies the validity of the regression estimators for design-based inference. In light of the flexibility of regression formulation for covariate adjustment, we further extend the theory to the case with covariates, and demonstrate the efficiency gain by regression-based covariate adjustment via both asymptotic theory and simulation. Importantly, all our theories are either numeric or design-based, and hold regardless of how well the regression equations represent the true data generating process.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call