Review of methods for handling confounding by cluster and informative cluster size in clustered data.

Shaun Seaman,Andrew Copas,Menelaos Pavlou

doi:10.1002/sim.6277

Abstract

Clustered data are common in medical research. Typically, one is interested in a regression model for the association between an outcome and covariates. Two complications that can arise when analysing clustered data are informative cluster size (ICS) and confounding by cluster (CBC). ICS and CBC mean that the outcome of a member given its covariates is associated with, respectively, the number of members in the cluster and the covariate values of other members in the cluster. Standard generalised linear mixed models for cluster-specific inference and standard generalised estimating equations for population-average inference assume, in general, the absence of ICS and CBC. Modifications of these approaches have been proposed to account for CBC or ICS. This article is a review of these methods. We express their assumptions in a common format, thus providing greater clarity about the assumptions that methods proposed for handling CBC make about ICS and vice versa, and about when different methods can be used in practice. We report relative efficiencies of methods where available, describe how methods are related, identify a previously unreported equivalence between two key methods, and propose some simple additional methods. Unnecessarily using a method that allows for ICS/CBC has an efficiency cost when ICS and CBC are absent. We review tools for identifying ICS/CBC. A strategy for analysis when CBC and ICS are suspected is demonstrated by examining the association between socio-economic deprivation and preterm neonatal death in Scotland.

Highlights

Clustered data commonly arise in epidemiology, for example, patients clustered within hospitals, pupils within schools, and teeth within patients
Methods are independence estimating equations, maximum likelihood estimate from random-intercept logistic regression model, conditional ML estimate from the same model, poor man’s method, and modelling expectation of random intercept as linear function of mean deprivation in cluster and cluster size
We have reviewed methods that have been proposed for population-average or clusterspecific inference in the presence of confounding by cluster (CBC) or informative cluster size (ICS)

Summary

Introduction

Clustered data commonly arise in epidemiology, for example, patients clustered within hospitals, pupils within schools, and teeth within patients. Standard GLMM assume that the random effect u associated with cluster is independent of X values in the members of that cluster. Violation of this assumption has been called CBC, because even if there is no confounding within clusters, association of u with X means that there may be confounding in the population as a whole. In a cohort study involving M waves, an individual is a cluster, a set of measurements on that individual at a particular wave is a member, and N is the number of waves attended before dropout In this case, interest may be in the association between Y and X in ‘complete clusters’, that is, the clusters composed of both the N observed members (before dropout) and the M − N missing members (after dropout), and inference about this association achieved by making some assumption about the missing data, for example, missing at random.

Informative cluster size and confounding by cluster

Model and assumptions

Interpretation of model parameters

Population-average inference

Method

Methods

Considerations in choosing a method

Example

Method log OR SE

Findings

Discussion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Statistics in medicine	Publication Date: Aug 4, 2014
Citations: 66	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

Review of methods for handling confounding by cluster and informative cluster size in clustered data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Statistics in medicine

Lead the way for us

Similar Papers

Risk prediction in multicentre studies when there is confounding by cluster or informative cluster size
Menelaos Pavlou ... Gareth Ambler
BMC Medical Research Methodology | VOL. 21
Menelaos Pavlou, et. al.Menelaos Pavlou ... Gareth Ambler
04 Jul 2021
BMC Medical Research Methodology | VOL. 21

Robust Testing of Paired Outcomes Incorporating Covariate Effects in Clustered Data with Informative Cluster Size
Sandipan Dutta
Stats | VOL. 5
Sandipan DuttaSandipan Dutta
14 Dec 2022
Stats | VOL. 5

Pseudo-value regression of clustered multistate current status data with informative cluster sizes.
Samuel Anyaso-Samuel ... Somnath Datta
Statistical methods in medical research | VOL. 32
Samuel Anyaso-Samuel, et. al.Samuel Anyaso-Samuel ... Somnath Datta
16 Jun 2023
Statistical methods in medical research | VOL. 32

Non-parametric regression in clustered multistate current status data with informative cluster size.
Ling Lan ... Somnath Datta
Statistica Neerlandica | VOL. 71
Ling Lan, et. al.Ling Lan ... Somnath Datta
25 Oct 2016
Statistica Neerlandica | VOL. 71

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Review of methods for handling confounding by cluster and informative cluster size in clustered data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Statistics in medicine