Abstract

BackgroundClustered data arise in research when patients are clustered within larger units. Generalised Estimating Equations (GEE) and Generalised Linear Models (GLMM) can be used to provide marginal and cluster-specific inference and predictions, respectively.MethodsConfounding by Cluster (CBC) and Informative cluster size (ICS) are two complications that may arise when modelling clustered data. CBC can arise when the distribution of a predictor variable (termed ‘exposure’), varies between clusters causing confounding of the exposure-outcome relationship. ICS means that the cluster size conditional on covariates is not independent of the outcome. In both situations, standard GEE and GLMM may provide biased or misleading inference, and modifications have been proposed. However, both CBC and ICS are routinely overlooked in the context of risk prediction, and their impact on the predictive ability of the models has been little explored. We study the effect of CBC and ICS on the predictive ability of risk models for binary outcomes when GEE and GLMM are used. We examine whether two simple approaches to handle CBC and ICS, which involve adjusting for the cluster mean of the exposure and the cluster size, respectively, can improve the accuracy of predictions.ResultsBoth CBC and ICS can be viewed as violations of the assumptions in the standard GLMM; the random effects are correlated with exposure for CBC and cluster size for ICS. Based on these principles, we simulated data subject to CBC/ICS. The simulation studies suggested that the predictive ability of models derived from using standard GLMM and GEE ignoring CBC/ICS was affected. Marginal predictions were found to be mis-calibrated. Adjusting for the cluster-mean of the exposure or the cluster size improved calibration, discrimination and the overall predictive accuracy of marginal predictions, by explaining part of the between cluster variability. The presence of CBC/ICS did not affect the accuracy of conditional predictions. We illustrate these concepts using real data from a multicentre study with potential CBC.ConclusionIgnoring CBC and ICS when developing prediction models for clustered data can affect the accuracy of marginal predictions. Adjusting for the cluster mean of the exposure or the cluster size can improve the predictive accuracy of marginal predictions.

Highlights

  • Clustered data arise in research when patients are clustered within larger units

  • In this work we explore whether ignoring Confounding by Cluster (CBC)/ Informative cluster size (ICS) has any effect on the predictive ability of risk models and investigate whether methods that have been proposed for handling these complications, can improve the predictive accuracy of risk models

  • We explored whether accounting for CBC/ICS can improve the accuracy of marginal predictions in comparison to the Basic model and address the miscalibration issues

Read more

Summary

Introduction

Clustered data arise in research when patients are clustered within larger units. Generalised Estimating Equations (GEE) and Generalised Linear Models (GLMM) can be used to provide marginal and clusterspecific inference and predictions, respectively. Patients may be clustered within health institutions, or be treated by different surgeons In these situations, the within-cluster outcomes tend to be correlated, i.e. the outcomes for patients within a centre are more similar between them than with patients from other centres, even after accounting for their patient-specific characteristics. When investigating the effect of a combined procedure on the risk of postangioplasty complications, CBC may arise when the proportion of patients who receive a combined procedure differs between centres and is related to differences in the proportion of complications between centres Whilst this scenario is not uncommon, it is often overlooked when developing a risk model using data that are clustered within larger units

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call