The primary objective of the study was to assess the impact of missing values on the analy- sis of binary repeated measures data with an additional hierarchical structure. One motivat- ing example for the present study was records of high somatic cell counts in milk samples obtained by approximately monthly sampling throughout the lactations of cows in dairy herds. Random effects models with autocorrelated (ρ = 1, 0.9 or 0.5) subject-level ran- dom effects were behind the simulated data. In general, the settings of the simulation were chosen to reflect a real somatic cell count dataset (scc40), except that the within-cow time series length was set to 8-time points for each cow. The estimation procedures consid- ered were: Ordinary Logistic Regression (OLR), Alternating Logistic Regression (ALR), Weighted Generalized Estimating Equations (WGEE), Penalized Quasi Likelihood (PQL), Maximum likelihood via numerical integration (ML) and Bayesian Markov chain Monte Carlo (MCMC). Multiple scenarios of simulated incomplete datasets were considered and include: a scenario corresponded to a combination of missingness patterns present in the scc40 dataset (scc40 scenario) The remaining scenarios involved only drop-outs, and corre- sponded to either moderate or high percentages of values either missing at random (MAR) or not missing at random (NMAR), respectively. In the scc40 scenario, all estimation procedures except OLR performed well and produced estimates with small relative bias (generally less than 5%) for levels of missingness that roughly corresponded to the scc40 data. In MAR missingness scenarios, some biases were found for ALR, WGEE and PQL procedures, whereas the likelihood-based procedures were largely unaffected by the miss- ing values. In NMAR scenarios, all procedures experienced similar and strong biases in the time coefficient; however, fixed effects estimates at the subject and cluster levels were relatively unaffected. Journal of Statistical Research 2023, Vol 57, No.1-2, pp.35-67
Read full abstract