Misspecification and flexible random effect distributions in logistic mixed effects models applied to panel survey data

Louise Marquart-Wilson

doi:10.14264/uql.2016.1083

Abstract

Logistic mixed models for binary longitudinal panel data typically assume normal distributed random effects, and appropriately account for correlated data, unobserved heterogeneity and missing data due to attrition. However, this normality assumption may be too restrictive to capture unobserved heterogeneity. The motivating case study is a longitudinal analysis of women's employment participation using data from the Household Income and Labour Force Dynamics in Australia (HILDA) survey. Multimodality of the random effects was identified, potentially due to an underlying mover-stayer scenario. This study focuses on logistic mixed models applied to the HILDA case study and simulation studies motivated by the case study, and aims to investigate: 1. robustness of random intercept logistic models to the assumed normal random effects distribution when the true distribution is multimodal 2. whether relaxing the parametric assumption of the random effects distribution can provide a practical solution to reduce the impact of distributional misspecification 3. impact of misspecification and performance of logistic mixed models in the presence of missing data due to attrition. Random intercept logistic models applied to the case study demonstrate that the assumed normal distribution may not adequately capture the underlying heterogeneity due to a potential mover-stayer scenario. An asymmetric three component mixture of normal distributions provided a more appropriate fit, potentially representing three sub-populations: those with an extremely low, moderate, or extremely high propensity to be constantly employed. Two simulation studies motivated by the HILDA study considered a three component mixture of normal distributions for the random intercept. The inferential impact of incorrectly assuming a normal distribution was dependent on the severity of departure of the true distribution from normality. In the first study, simulating a potential mover-stayer scenario, misspecification produced biased estimates of the intercept constant and random effect variance. More severely asymmetric and skewed multimodal distributions produced larger bias. The second study considered a range of true symmetric multimodal distributions, with increasing severity in departures from normality. The random intercept logistic model assuming normality was robust to minor deviations. However, for larger departures characterised by three distinct modes, misspecification produced biased parameter estimates and poor coverage rates for the intercept constant, time-invariant explanatory variables and those time-varying explanatory variables exhibiting minimal within-individual variability. For both simulation studies, estimates of the random effect variance were extremely sensitive to distributional misspecification, resulting in biased parameter estimates, poor coverage rates and inaccurate standard errors. Non-parametric estimation techniques, which leave the distribution completely unspecified, reduced the risks associated with misspecification of the random effects distribution. A novel application of the Vertex Exchange Method (VEM) was used to non-parametrically estimate the random effects distribution in logistic mixed models. The VEM was computationally intensive yet performed well to capture the univariate and bivariate random effects distribution when applied to the HILDA case study. VEM was the only method to converge when applied to the random intercept and random slope logistic mixed model. Inferential conclusions for the fixed effects parameters differed depending on the approach utilised, highlighting the practical use of sensitivity analyses to identify potential distributional misspecification of the random effects. Distributional misspecification of the random intercept in the presence of missing data from attrition gave similar parameter estimates as for the complete case analysis, indicative of missing at random (MAR) missingness. The two simulation studies show that MAR attrition had minimal additional inferential impact on misspecifying the random intercept distribution, for a similar rate of 29.5% attrition observed in HILDA. As the negligible impact may partly be explained by the consistency of logistic mixed models in the presence of MAR missingness and by the large sample size, consideration of other missingness mechanisms and rates could be valuable. Flexible and non-parametric approaches applied to settings with attrition performed similarly as the complete case scenario. Appropriate statistical analysis of longitudinal panel data is fundamental for researchers and policy makers to formulate and evaluate policy initiatives in health and social sciences. Hence, the need for the appropriate use and understanding of statistical models is crucial. This study provides a novel insight into the impact of assuming normality for the random effects in logistic mixed models applied to panel data where an underlying sub-population structure is suspected. For substantial departures characterised by multimodality with distinct modes, inference for the fixed effect parameters, typically the parameters of interest, can be impacted. Misspecification in the presence of MAR attrition had negligible additional inferential impact. More flexible distributions for the random effects is a practical solution to help reduce the impact of violating distributional assumptions, and identify potential misspecification when used within a sensitivity analysis framework. VEM induced sufficient flexibility to capture multimodality of random intercepts and the complexity of the bivariate random effects in panel survey settings, including attrition. The performance of the VEM to flexibly model random effects should encourage its implementation in applications in the health and social sciences.

Full Text