An investigation of penalization and data augmentation to improve convergence of generalized estimating equations for clustered binary outcomes

  • Abstract
  • Highlights & Summary
  • PDF
  • Literature Map
  • References
  • Citations
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

BackgroundIn binary logistic regression data are ‘separable’ if there exists a linear combination of explanatory variables which perfectly predicts the observed outcome, leading to non-existence of some of the maximum likelihood coefficient estimates. A popular solution to obtain finite estimates even with separable data is Firth’s logistic regression (FL), which was originally proposed to reduce the bias in coefficient estimates. The question of convergence becomes more involved when analyzing clustered data as frequently encountered in clinical research, e.g. data collected in several study centers or when individuals contribute multiple observations, using marginal logistic regression models fitted by generalized estimating equations (GEE). From our experience we suspect that separable data are a sufficient, but not a necessary condition for non-convergence of GEE. Thus, we expect that generalizations of approaches that can handle separable uncorrelated data may reduce but not fully remove the non-convergence issues of GEE.MethodsWe investigate one recently proposed and two new extensions of FL to GEE. With ‘penalized GEE’ the GEE are treated as score equations, i.e. as derivatives of a log-likelihood set to zero, which are then modified as in FL. We introduce two approaches motivated by the equivalence of FL and maximum likelihood estimation with iteratively augmented data. Specifically, we consider fully iterated and single-step versions of this ‘augmented GEE’ approach. We compare the three approaches with respect to convergence behavior, practical applicability and performance using simulated data and a real data example.ResultsOur simulations indicate that all three extensions of FL to GEE substantially improve convergence compared to ordinary GEE, while showing a similar or even better performance in terms of accuracy of coefficient estimates and predictions. Penalized GEE often slightly outperforms the augmented GEE approaches, but this comes at the cost of a higher burden of implementation.ConclusionsWhen fitting marginal logistic regression models using GEE on sparse data we recommend to apply penalized GEE if one has access to a suitable software implementation and single-step augmented GEE otherwise.

Highlights

  • When modeling a binary outcome with a set of explanatory variables using logistic regression, one frequently encounters the problem of separation

  • One possibility to obtain finite estimates of regression coefficients even in the case of separation is to resort to Firth’s logistic regression (FL), which was originally proposed to reduce the bias in coefficient estimates compared to maximum likelihood estimation [2]

  • Single‐step augmented generalized estimating equations (GEE) Third, we investigated a simpler version of the augmented GEE algorithm which can be implemented whenever fitting algorithms for FL and weighted GEE are available

Read more Highlights Expand/Collapse icon

Summary

IntroductionExpand/Collapse icon

When modeling a binary outcome with a set of explanatory variables using logistic regression, one frequently encounters the problem of separation. The question of convergence becomes more involved when we want to model clustered data, as frequently encountered in clinical research, e.g. data collected in several study centers or when individuals contribute multiple observations. Extensions of approaches that can deal with separation in uncorrelated data may not fully remove all non-convergence issues in the setting of GEE. In binary logistic regression data are ‘separable’ if there exists a linear combination of explanatory variables which perfectly predicts the observed outcome, leading to non-existence of some of the maximum likelihood coefficient estimates. The question of convergence becomes more involved when analyzing clustered data as frequently encountered in clinical research, e.g. data collected in several study centers or when individuals contribute multiple observations, using marginal logistic regression models fitted by generalized estimating equations (GEE).

MethodsExpand/Collapse icon
ResultsExpand/Collapse icon
ConclusionExpand/Collapse icon
ReferencesShowing 9 of 23 papers
  • Cite Count Icon 37
  • 10.1198/tech.2011.09197
Blocked Designs for Experiments With Correlated Non-Normal Response
  • May 1, 2011
  • Technometrics
  • David C Woods + 1 more

  • Open Access Icon
  • Cite Count Icon 924
  • 10.1002/sim.8086
Using simulation studies to evaluate statistical methods.
  • Jan 16, 2019
  • Statistics in Medicine
  • Tim P Morris + 2 more

  • Open Access Icon
  • PDF Download Icon
  • Cite Count Icon 123
  • 10.1093/biomet/asaa052
Jeffreys-prior penalty, finiteness and shrinkage in binomial-response generalized linear models
  • Aug 4, 2020
  • Biometrika
  • Ioannis Kosmidis + 1 more

  • Cite Count Icon 6
  • 10.32614/cran.package.mmmgee
Mmmgee: Simultaneous Inference for Multiple Linear Contrasts in GEE Models
  • Feb 23, 2018
  • Robin Ristl

  • Cite Count Icon 84
  • 10.1016/0022-2836(84)90252-3
Characterization of two crystal forms of neutrophil cationic protein NP2, a naturally occurring broad-spectrum antimicrobial agent from leukocytes
  • Sep 1, 1984
  • Journal of Molecular Biology
  • Edwin M Westbrook

  • Cite Count Icon 18
  • 10.2307/2336390
On the Existence of Maximum Likelihood Estimates in Logistic Regression Models
  • Apr 1, 1984
  • Biometrika
  • A Albert + 1 more

  • Cite Count Icon 3762
  • 10.1093/biomet/80.1.27
Bias reduction of maximum likelihood estimates
  • Jan 1, 1993
  • Biometrika
  • David Firth

  • Cite Count Icon 7
  • 10.32614/cran.package.detectseparation
Detectseparation: Detect and Check for Separation and Infinite Maximum Likelihood Estimates
  • Mar 25, 2020
  • Ioannis Kosmidis + 2 more

  • Open Access Icon
  • Cite Count Icon 1271
  • 10.18637/jss.v015.i02
TheRPackagegeepackfor Generalized Estimating Equations
  • Jan 1, 2006
  • Journal of Statistical Software
  • Ulrich Halekoh + 2 more

CitationsShowing 5 of 5 papers
  • Open Access Icon
  • Research Article
  • 10.3390/children12030378
Conventional Cardiopulmonary Resuscitation Versus Extracorporeal Membrane Oxygenation-Assisted CPR in Children: A Retrospective Analysis of Outcomes and Factors Associated with Conversion from the Former to the Latter.
  • Mar 18, 2025
  • Children (Basel, Switzerland)
  • Adrian C Mattke + 7 more

Conventional cardiopulmonary resuscitation (CCPR) has been the foundational resuscitation approach for decades. Where CCPR is unsuccessful, extracorporeal membrane oxygenation-assisted CPR (ECPR) may improve outcomes. Predicting failure of CCPR and immediate need for ECPR is difficult, and data are lacking. In this retrospective analysis, we analysed both factors that are associated with conversion from CCPR to ECPR and survival outcomes for each event. Patients having a CPR event that occurred in the PICU between 2016 and 2022 were included. Pre-CPR-event clinical and laboratory data were collected. We recorded whether CPR was converted to ECPR and documented patient outcomes. 201 CPR events occurred in 164 children, with 45 events converted from CCPR to ECPR. Time to ROSC or time to ECMO flow was (median [IQR]) 2 (1.5) min for CCPR events and 37 (21.60) min for ECPR events. The maximum pre-CPR-event lactate values were 1.8 mmol/L for CCPR and 4.5 mmol/L for ECPR events. Respiratory arrest preceded 35.3% of CCPR and 4.4% of ECPR events. PICU mortality was 27.8% for CCPR and 50% for ECPR events. Most deaths occurred because of withdrawal of life-sustaining treatments. In a multivariable analysis, cardiac surgical diagnosis, pre-CPR-event lactate, as well as duration of CPR were associated with conversion from CCPR to ECPR. Our study demonstrates that pre-CPR-event lactate concentrations and duration of arrest should alert clinicians to a high likelihood of needing ECPR, while a preceding respiratory arrest may indicate a low likelihood. Mortality post CCPR is significant, mainly due to overall illness severity rather than the consequences of the CPR event.

  • Research Article
  • 10.1038/s41598-025-13366-9
Comorbidities associated with fetal alcohol spectrum disorders in the United States
  • Aug 13, 2025
  • Scientific Reports
  • Brandon K Attell + 3 more

The detrimental effects of prenatal alcohol exposure on the development of humans are well understood and include fetal alcohol spectrum disorders (FASD), a broad set of conditions referring to the adverse physical and behavioral health impairments associated with exposure to alcohol in utero. Using a case-control study design, the purpose of this study was to better understand the complex comorbidity patterns associated with FASD (N = 3,248) and to examine how they differ with the general patient population (N = 16,240) and a cohort of behavioral health controls (N = 16,240). Employing a novel unsupervised machine learning algorithm applied to a nationally representative hospital discharge database, we found 57 distinct comorbidities that frequently occurred among FASD cases, in addition to a set of 144 complex overlapping comorbidity patterns. The identified comorbidities were generally more likely to occur in the FASD cases compared to the general patient population control group, while differences with behavioral health controls were less readily apparent. This study adds to a small but growing body of research on comorbidities experienced by individuals with FASD. We discuss the implications of the identified comorbidity patterns on the ongoing identification, treatment, and surveillance of FASD in the US.Supplementary InformationThe online version contains supplementary material available at 10.1038/s41598-025-13366-9.

  • Research Article
  • 10.1002/sim.10140
Covariate-adjusted generalized pairwise comparisons in small samples.
  • Jul 4, 2024
  • Statistics in medicine
  • Stijn Jaspers + 2 more

Semiparametric probabilistic index models allow for the comparison of two groups of observations, whilst adjusting for covariates, thereby fitting nicely within the framework of generalized pairwise comparisons (GPC). As with most regression approaches in this setting, the limited amount of data results in invalid inference as the asymptotic normality assumption is not met. In addition, separation issues might arise when considering small samples. In this article, we show that the parameters of the probabilistic index model can be estimated using generalized estimating equations, for which adjustments exist that lead to estimators of the sandwich variance-covariance matrix with improved finite sample properties and that can deal with bias due to separation. In this way, appropriate inference can be performed as is shown through extensive simulation studies. The known relationships between the probabilistic index and other GPC statistics allow to also provide valid inference for example, the net treatment benefit or the success odds.

  • Open Access Icon
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.1186/s12874-024-02259-6
A bias-reduced generalized estimating equation approach for proportional odds models with small-sample longitudinal ordinal data
  • Jun 28, 2024
  • BMC Medical Research Methodology
  • Yukio Tada + 1 more

BackgroundLongitudinal ordinal data are commonly analyzed using a marginal proportional odds model for relating ordinal outcomes to covariates in the biomedical and health sciences. The generalized estimating equation (GEE) consistently estimates the regression parameters of marginal models even if the working covariance structure is misspecified. For small-sample longitudinal binary data, recent studies have shown that the bias of regression parameters may result from the GEE and have addressed the issue by applying Firth’s adjustment for the likelihood score equation to the GEE as if generalized estimating functions were likelihood score functions. In this manuscript, for the proportional odds model for longitudinal ordinal data, the small-sample properties of the GEE were investigated, and a bias-reduced GEE (BR-GEE) was derived.MethodsBy applying the adjusted function originally derived for the likelihood score function of the proportional odds model to the GEE, we produced the BR-GEE. We investigated the small-sample properties of both GEE and BR-GEE through simulation and applied them to a clinical study dataset.ResultsIn simulation studies, the BR-GEE had a bias closer to zero, smaller root mean square error than the GEE with coverage probability of confidence interval near or above the nominal level. The simulation also showed that BR-GEE maintained a type I error rate near or below the nominal level.ConclusionsFor the analysis of longitudinal ordinal data involving a small number of subjects, the BR-GEE is advantageous for obtaining estimates of the regression parameters of marginal proportional odds models.

  • Research Article
  • 10.1016/j.drugpo.2025.104824
Impact of scaling up harm reduction interventions on injecting risk behaviours, ART outcomes and HIV incidence among people who inject drugs in Kenya.
  • Jun 1, 2025
  • The International journal on drug policy
  • Josephine G Walker + 9 more

Impact of scaling up harm reduction interventions on injecting risk behaviours, ART outcomes and HIV incidence among people who inject drugs in Kenya.

Similar Papers
  • Research Article
  • Cite Count Icon 26
  • 10.1080/00031305.2000.10474544
A Note on Marginal Linear Regression with Correlated Response Data
  • Aug 1, 2000
  • The American Statistician
  • Wei Pan + 2 more

Correlated response data often arise in longitudinal and familial studies. The marginal regression model and its associated generalized estimating equation (GEE) method are becoming more and more popular in handling such data. Pepe and Anderson pointed out that there is an important yet implicit assumption behind the marginal model and GEE. If the assumption is violated and a nondiagonal working correlation matrix is used in GEE, biased estimates of regression coefficients may result. On the other hand, if a diagonal correlation matrix is used, irrespective of whether the assumption is violated, the resulting estimates are (nearly) unbiased. A straightforward interpretation of this phenomenon is lacking, in part due to the unavailability of a closed form for the resulting GEE estimates. In this note, we show how the bias may arise in the context of linear regression, where the GEE estimates of regression coefficients are the ordinary or generalized least squares (LS) estimates. Also we explain why the generalized LS estimator may be biased, in contrast to the well-known result that it is usually unbiased. In addition, we discuss the bias properties of the sandwich variance estimator of the ordinary LS estimate.

  • Research Article
  • Cite Count Icon 2
  • 10.1007/s10985-006-9013-1
Marginal regression models with a time to event outcome and discrete multiple source predictors
  • Aug 2, 2006
  • Lifetime Data Analysis
  • Heather J Litman + 3 more

Information from multiple informants is frequently used to assess psychopathology. We consider marginal regression models with multiple informants as discrete predictors and a time to event outcome. We fit these models to data from the Stirling County Study; specifically, the models predict mortality from self report of psychiatric disorders and also predict mortality from physician report of psychiatric disorders. Previously, Horton et al. found little relationship between self and physician reports of psychopathology, but that the relationship of self report of psychopathology with mortality was similar to that of physician report of psychopathology with mortality. Generalized estimating equations (GEE) have been used to fit marginal models with multiple informant covariates; here we develop a maximum likelihood (ML) approach and show how it relates to the GEE approach. In a simple setting using a saturated model, the ML approach can be constructed to provide estimates that match those found using GEE. We extend the ML technique to consider multiple informant predictors with missingness and compare the method to using inverse probability weighted (IPW) GEE. Our simulation study illustrates that IPW GEE loses little efficiency compared with ML in the presence of monotone missingness. Our example data has non-monotone missingness; in this case, ML offers a modest decrease in variance compared with IPW GEE, particularly for estimating covariates in the marginal models. In more general settings, e.g., categorical predictors and piecewise exponential models, the likelihood parameters from the ML technique do not have the same interpretation as the GEE. Thus, the GEE is recommended to fit marginal models for its flexibility, ease of interpretation and comparable efficiency to ML in the presence of missing data.

  • Research Article
  • Cite Count Icon 20
  • 10.1289/ehp.1102453
Statistical Methods to Study Timing of Vulnerability with Sparsely Sampled Data on Environmental Toxicants
  • Dec 8, 2010
  • Environmental Health Perspectives
  • Brisa Ney Sánchez + 3 more

Statistical Methods to Study Timing of Vulnerability with Sparsely Sampled Data on Environmental Toxicants

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.1186/s12913-022-08250-5
Determinants of continuing mental health service use among older persons diagnosed with depressive disorders in general hospitals: latent class analysis and GEE
  • Jul 11, 2022
  • BMC health services research
  • Thida Mulalint + 3 more

BackgroundPrevalence of depression in older persons was a leading cause of disability. This group has the lowest access to service and retention in care compared to other age groups. This study aimed to explore continuing mental health service use and examined the predictive power of the mental health service delivery system and individual factors on mental health service use among older persons diagnosed with depressive disorders.MethodsWe employed an analytic cross-sectional study design of individual and organizational variables in 12 general hospitals selected using multi-stratified sampling. There were 3 clusters comprising community hospitals, advanced and standard hospitals, and university hospitals. Participants in each group were 150 persons selected by purposive sampling. We included older persons with a first or recurring diagnosis of a depressive disorder in the last 6 to 12 months of the data collection date. Data at the individual level included socio-demographic characteristics, Charlson Comorbidity Index, Attitude toward Depression and its treatment, and perceived social support. Data at the organizational level had hospital level, nurse competency, nurse-patient ratio, and appointment reminders. Descriptive statistics, Pearson chi-square test, latent class analysis (LCA), and marginal logistic regression model using generalized estimating equation (GEE) were used to analyze the data.ResultsThe continuing mental health service use among older persons diagnosed with depressive disorders was 54%. The latent class analysis of four variables in the mental health services delivery organization yielded distinct and interpretable findings in two groups: high and low resource organization. The marginal logistic multivariable regression model using GEE found that organizational group and attitude toward depression and its treatment were significantly associated with mental health service use (p-value = 0.046; p-value = 0.003).ConclusionsThe findings suggest that improving continuing mental health services use in older persons diagnosed with depressive disorders should emphasize specialty resources of the mental health services delivery system and attitude toward depression and its treatment.

  • Research Article
  • 10.7916/d8kd24wh
Flexible models and methods for longitudinal and multilevel functional data
  • Jan 1, 2012
  • Huaihou Chen

Flexible models and methods for longitudinal and multilevel functional data

  • Book Chapter
  • Cite Count Icon 2
  • 10.1016/b978-0-12-801342-7.00009-5
Chapter 9 - Generalized estimating equations (GEEs) models
  • Sep 4, 2015
  • Methods and Applications of Longitudinal Data Analysis
  • Xian Liu

Chapter 9 - Generalized estimating equations (GEEs) models

  • Research Article
  • Cite Count Icon 35
  • 10.1111/2041-210x.12623
Marginal or conditional regression models for correlated non‐normal data?
  • Aug 30, 2016
  • Methods in Ecology and Evolution
  • Stefanie Muff + 2 more

SummaryCorrelated data are ubiquitous in ecological and evolutionary research, and appropriate statistical analysis requires that these correlations are taken into account. For regressions with correlated, non‐normal outcomes, two main approaches are used: conditional and marginal modelling. The former leads to generalized linear mixed models (GLMMs), while the latter are estimated using generalized estimating equations (GEEs), or marginalized multilevel regression models. Differences, advantages and drawbacks of conditional and marginal models have been discussed extensively in the statistical and applied literature, and there is some agreement that the choice of the model must depend on the question under study. Yet, there still appears to be a lot of confusion and disagreement over when to choose which model.We start with a review of conditional and marginal models, and the differences in the interpretation of the resulting parameter estimates. We highlight that the two types of models propagate different linear relations between the covariates and the response. Moreover, while conditional models explicitly account for heterogeneity among clustered observations, marginal models yield averages over such heterogeneities and are therefore often interpreted as population‐averaged models.We point out theoretically and with an example that when modelling non‐normal outcomes no unambiguous definition of a marginal model generally exists. Instead, marginal model parameters are marginal only with respect to unaccounted differences among clusters and thus depend on the fixed effects in the model. Therefore, marginal model parameters should not be loosely interpreted as population‐averaged parameters. In addition, we explain how marginal modelling is mathematically analogous to deliberately omitting covariates with explanatory power, and to deliberately introducing a Berkson measurement error into covariates. We also reiterate that marginal modelling is related to a well‐known statistical phenomenon, the Simpson's paradox.In most cases, therefore, we regard the conditional model as the more powerful choice to explain how covariates are associated with a non‐normal response. Still, marginal models can be useful, given that the scientific question explicitly requires such a model formulation.

  • Research Article
  • Cite Count Icon 2
  • 10.7465/jkdi.2013.24.4.877
Generalized methods of moments in marginal models for longitudinal data with time-dependent covariates
  • Jul 31, 2013
  • Journal of the Korean Data and Information Science Society
  • Gyo-Young Cho + 1 more

The quadratic inference functions (QIF) method proposed by Qu et al. (2000) and the generalized method of moments (GMM) for marginal regression analysis of longitudinal data with time-dependent covariates proposed by Lai and Small (2007) both are the methods based on generalized method of moment (GMM) introduced by Hansen (1982) and both use generalized estimating equations (GEE). Lai and Small (2007) divided time-dependent covariates into three types such as: Type I, Type II and Type III. In this paper, we compared these methods in the case of Type II and Type III in which full covariates conditional mean assumption (FCCM) is violated and interested in whether they can improve the results of GEE with independence working correlation. We show that in the marginal regression model with Type II time-dependent covariates, GMM Type II of Lai and Small (2007) provides more ecient result than QIF and for the Type III time-dependent covariates, QIF with independence working correlation and GMM Type III methods provide the same results. Our simulation study showed the same results.

  • Research Article
  • 10.1002/sim.70074
A Variance Estimator for Marginal Cox Regression Models Fit to Non-Nested Multilevel Data.
  • Apr 1, 2025
  • Statistics in medicine
  • Peter C Austin

In health services research, researchers often use clustered data to estimate the independent association between individual outcomes and cluster-level covariates after adjusting for individual-level characteristics. Marginal generalized linear models estimated using generalized estimating equation (GEE) methods or hierarchical (or multilevel) regression models can be used when there is a single source of clustering (e.g., patients nested within hospitals). Hierarchical regression models can also be used when there are multiple sources of clustering (e.g., patients nested within surgeons who in turn are nested within hospitals). Methods for estimating marginal regression models are less well-developed when there are multiple sources of non-nested clustering (e.g., patients are clustered both within hospitals and within in neighborhoods, but neither neighborhoods or hospitals are nested in the other). Miglioretti and Heagerty developed a GEE-type variance estimator for use when fitting marginal generalized linear models to non-nested multilevel data. We propose a variance estimator for a marginal Cox regression model fit to non-nested multilevel data that combined their approach with Lin and Wei's robust variance estimator for the Cox model. We evaluated the performance of the proposed variance estimator using an extensive set of Monte Carlo simulations. We illustrated the use of the variance estimator in a case study consisting of patients hospitalized with an acute myocardial infarction who were clustered within hospitals and who were also clustered in neighborhoods. In summary, a variance estimator motivated by that proposed by Miglioretti and Heagerty can be used with marginal Cox regression models fit to non-nested multilevel data.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1016/s0169-7161(07)27025-7
25 Estimation of Marginal Regression Models with Multiple Source Predictors
  • Jan 1, 2007
  • Handbook of Statistics
  • Heather J Litman + 3 more

25 Estimation of Marginal Regression Models with Multiple Source Predictors

  • Research Article
  • Cite Count Icon 6
  • 10.1002/sim.9744
A comparison of bias-adjusted generalized estimating equations for sparse binary data in small-sample longitudinal studies.
  • Apr 16, 2023
  • Statistics in Medicine
  • Masahiko Gosho + 3 more

Using a generalized estimating equation (GEE) can lead to a bias in regression coefficients for a small sample or sparse data. The bias-corrected GEE (BCGEE) and penalized GEE (PGEE) were proposed to resolve the small-sample bias. Moreover, the standard sandwich covariance estimator leads to a bias of standard error for small samples; several modified covariance estimators have been proposed to address this issue. We review the modified GEEs and modified covariance estimators, and evaluate their performance in sparse binary data from small-sample longitudinal studies. The simulation results showed that GEE and BCGEE often failed to achieve convergence, whereas the convergence proportion for PGEE was quite high. The bias for the regression coefficients was generally in the ascending order of PGEE BCGEE GEE. However, PGEE and BCGEE did not sufficiently remove the bias involving 20-30 subjects with unequal exposure levels with a 5% response rate. The coverage probability (CP) of the confidence interval for BCGEE was relatively poor compared with GEE and PGEE. The CP with the sandwich covariance estimator deteriorated regardless of the GEE methods under the small sample size and low response rate, whereas the CP with the modified covariance estimators-such as Morel's method-was relatively acceptable. PGEE will be the reasonable way for analyzing sparse binary data in small-sample studies. Instead of using the standard sandwich covariance estimator, one should always apply the modified covariance estimators for analyzing these data.

  • Research Article
  • Cite Count Icon 3
  • 10.7465/jkdi.2013.24.3.651
Quadratic inference functions in marginal models for longitudinal data with time-varying stochastic covariates
  • May 31, 2013
  • Journal of the Korean Data and Information Science Society
  • Gyo-Young Cho + 1 more

For the marginal model and generalized estimating equations (GEE) method there is important full covariates conditional mean (FCCM) assumption which is pointed out by Pepe and Anderson (1994). With longitudinal data with time-varying stochastic covariates, this assumption may not necessarily hold. If this assumption is violated, the biased estimates of regression coefficients may result. But if a diagonal working correlation matrix is used, irrespective of whether the assumption is violated, the resulting estimates are (nearly) unbiased (Pan et al., 2000).The quadratic inference functions (QIF) method proposed by Qu et al. (2000) is the method based on generalized method of moment (GMM) using GEE. The QIF yields a substantial improvement in efficiency for the estimator of <TEX>${\beta}$</TEX> when the working correlation is misspecified, and equal efficiency to the GEE when the working correlation is correct (Qu et al., 2000).In this paper, we interest in whether the QIF can improve the results of the GEE method in the case of FCCM is violated. We show that the QIF with exchangeable and AR(1) working correlation matrix cannot be consistent and asymptotically normal in this case. Also it may not be efficient than GEE with independence working correlation. Our simulation studies verify the result.

  • Research Article
  • Cite Count Icon 116
  • 10.2307/2533434
An Application of Maximum Likelihood and Generalized Estimating Equations to the Analysis of Ordinal Data from a Longitudinal Study with Cases Missing at Random
  • Dec 1, 1994
  • Biometrics
  • M G Kenward + 2 more

Data are analysed from a longitudinal psychiatric study in which there are no dropouts that do not occur completely at random. A marginal proportional odds model is fitted that relates the response (severity of side effects) to various covariates. Two methods of estimation are used: generalized estimating equations (GEE) and maximum likelihood (ML). Both the complete set of data and the data from only those subjects completing the study are analysed. For the completers-only data, the GEE and ML analyses produce very similar results. These results differ considerably from those obtained from the analyses of the full data set. There are also marked differences between the results obtained from the GEE and ML analysis of the full data set. The occurrence of such differences is consistent with the presence of a non-completely-random dropout process and it can be concluded in this example that both the analyses of the completers only and the GEE analysis of the full data set produce misleading conclusions about the relationships between the response and covariates.

  • Research Article
  • Cite Count Icon 27
  • 10.1002/(sici)1097-0258(19990615)18:11<1419::aid-sim127>3.0.co;2-q
Testing proportionality in the proportional odds model fitted with GEE.
  • Jun 15, 1999
  • Statistics in Medicine
  • Thomas R Stiger + 2 more

Generalized estimating equations (GEE) methodology as proposed by Liang and Zeger has received widespread use in the analysis of correlated binary data. Miller et al. and Lipsitz et al. extended GEE to correlated nominal and ordinal categorical data; in particular, they used GEE for fitting McCullagh's proportional odds model. In this paper, we consider robust (that is, empirically corrected) and model-based versions of both a score test and a Wald test for assessing the assumption of proportional odds in the proportional odds model fitted with GEE. The Wald test is based on fitting separate multiple logistic regression models for each dichotomization of the response variable, whereas the score test requires fitting just the proportional odds model. We evaluate the proposed tests in small to moderate samples by simulating data from a series of simple models. We illustrate the use of the tests on three data sets from medical studies.

  • Research Article
  • Cite Count Icon 3
  • 10.1200/jco.2009.27.15_suppl.9614
Quality of life in advanced non-small cell lung cancer patients receiving first-line gefitinib monotherapy
  • May 20, 2009
  • Journal of Clinical Oncology
  • Y Shao + 9 more

9614 Background: Gefitinib is a potential first-line treatment option for patients with advanced non-small cell lung cancer (NSCLC), especially for patients with activating mutations in the EGFR gene. However, little is known about patient-reported health-related quality of life (HRQOL) in this patient population. The aims of this study were to explore the prognostic values of baseline HRQOL for time-to-treatment failure (TTF), as well as the predictors of repeatedly measured posttreatment HRQOL, in advanced NSCLC patients receiving first-line gefitinib. Methods: A total of 106 chemonaive patients with advanced NSCLC were enrolled in a phase II trial. Gefitinib was given at a dose of 250 mg/d. HRQOL was assessed monthly with the EuroQoL instrument (EQ-5D) and the Lung Cancer Symptom Scale (LCSS) questionnaire. Baseline HRQOL and clinical/molecular predictors of TTF were jointly examined by multiple Cox's proportional hazards model. The associations between the clinical/molecular factors and repeatedly measured posttreatment HRQOL were analyzed by fitting marginal linear regression model using the generalized estimating equations (GEE) method. Results: In this prospective study, HRQOL data were obtained from 94 patients. Baseline EQ-5D index (estimated hazard ratio = 0.286, 95% C.I.: 0.135–0.603, p = 0.001) and the presence of L858R EGFR mutation in adenocarcinoma (estimated hazard ratio = 0.520, 95% C.I.: 0.307–0.880, p = 0.015) were retained as independent prognostic factors in the final multiple Cox's proportional hazards model for TTF. According to preliminary GEE analysis of repeatedly measured posttreatment HRQOL, the patients with wild-type EGFR consistently had worse HRQOL in EQ-5D index (p &lt; 0.0001), EQ-5D VAS score (p = 0.0002), and LCSS global score (p &lt; 0.0001), respectively. Conclusions: In advanced NSCLC patients receiving first-line gefitinib, better baseline EQ-5D index and L858R EGFR mutation in adenocarcinoma predict longer TTF. In addition, patients with wild-type EGFR had worse posttreatment HRQOL. No significant financial relationships to disclose.

More from: BMC Medical Research Methodology
  • New
  • Research Article
  • 10.1186/s12874-025-02690-3
Assessing the quality for integrated guidelines: systematic comparison between the AGREE Ⅱ and AGREE-HS tools.
  • Nov 6, 2025
  • BMC medical research methodology
  • Gezhi Zhang + 7 more

  • New
  • Research Article
  • 10.1186/s12874-025-02706-y
A methodology for developing dermatological datasets: lessons from retrospective data collection for AI-based applications.
  • Nov 5, 2025
  • BMC medical research methodology
  • Alma Pedro + 10 more

  • New
  • Research Article
  • 10.1186/s12874-025-02689-w
Assessing the accuracy of survival machine learning and traditional statistical models for Alzheimer's disease prediction over time: a study on the ADNI cohort
  • Nov 5, 2025
  • BMC Medical Research Methodology
  • Sardar Jahani + 2 more

  • New
  • Research Article
  • 10.1186/s12874-025-02670-7
Current practice on covariate adjustment and stratified analysis —based on survey results by ASA oncology estimand working group conditional and marginal effect task force
  • Nov 4, 2025
  • BMC Medical Research Methodology
  • Jiawei Wei + 8 more

  • New
  • Supplementary Content
  • 10.1186/s12874-025-02683-2
Enhancing confidence in complex health technology assessments by using real-world evidence: highlighting existing strategies for effective drug evaluation
  • Nov 3, 2025
  • BMC Medical Research Methodology
  • Alison Antoine + 4 more

  • New
  • Discussion
  • 10.1186/s12874-025-02700-4
The importance of considering variability in re-expression of effect estimates for use in meta-analyses
  • Oct 30, 2025
  • BMC Medical Research Methodology
  • Leonid Kopylev + 1 more

  • New
  • Discussion
  • 10.1186/s12874-025-02699-8
Response to “The importance of considering variability in re-expression of effect estimates for use in meta-analysis.” (Kopylev and Dzierlenga 2025)
  • Oct 30, 2025
  • BMC Medical Research Methodology
  • Matthew W Linakis + 1 more

  • New
  • Research Article
  • 10.1186/s12874-025-02685-0
Comparing in-person and remote consent of people with dementia into a primary care-based cluster randomised controlled trial: lessons from the Dementia PersonAlised Care Team (D-PACT) feasibility study.
  • Oct 30, 2025
  • BMC medical research methodology
  • T M Oh + 19 more

  • Research Article
  • 10.1186/s12874-025-02696-x
Identifying delayed human response to external risks: an econometric analysis of mobility change during a pandemic
  • Oct 29, 2025
  • BMC Medical Research Methodology
  • Gaofei Zhang + 4 more

  • Research Article
  • 10.1186/s12874-025-02694-z
Comparison of machine learning methods versus traditional Cox regression for survival prediction in cancer using real-world data: a systematic literature review and meta-analysis
  • Oct 28, 2025
  • BMC Medical Research Methodology
  • Yinan Huang + 6 more

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon
Setting-up Chat
Loading Interface