Abstract

The main focus of the analysts who deal with clustered data is usually not on the clustering variables, and hence the group-specific parameters are treated as nuisance. If a fixed effects formulation is preferred and the total number of clusters is large relative to the single-group sizes, classical frequentist techniques relying on the profile likelihood are often misleading. The use of alternative tools, such as modifications to the profile likelihood or integrated likelihoods, for making accurate inference on a parameter of interest can be complicated by the presence of nonstandard modelling and/or sampling assumptions. We show here how to employ Monte Carlo simulation in order to approximate the modified profile likelihood in some of these unconventional frameworks. The proposed solution is widely applicable and is shown to retain the usual properties of the modified profile likelihood. The approach is examined in two instances particularly relevant in applications, i.e. missing-data models and survival models with unspecified censoring distribution. The effectiveness of the proposed solution is validated via simulation studies and two clinical trial applications.

Highlights

  • Clustered data, either cross-sectional or longitudinal observations which may be arranged in groups, are nowadays encountered in all applied areas

  • If the interest centers on comparing the response variable of units across groups, it is preferable to adopt marginal models, where the clustering structure is ignored for estimation of the regression coefficients and is only employed to ensure correct inference on the standard errors

  • While the first two categories are associated with an ignorable mechanism of missingness, when data are missing not at random (MNAR) the probability of missing observations depends on values that are unobserved, and the supposed model must take into account the missingness process for providing valid results [24, Section 15.1]

Read more

Summary

Introduction

The lacking registration of some data is the rule rather than the exception in quantitative research analysis. [34] developed the first basic classification of data still in use today: missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). Speaking, in moderately complex models for incomplete datasets, maximization of the log-likelihood function incorporating all the available information is often quite an arduous task. This function, named observed log-likelihood, involves integrals or summations over the distribution of the missing data which can be hardly tractable. It is well-known that the EM algorithm [14] is a possibly advantageous strategy for ML estimation whenever data either are partially not observed or may be viewed as such. It is yet important to point out that, regardless of the selected technique, nonignorable missing-data models need to be fitted with special care because the available information may be insufficient to estimate all parameters [18]

Profile and modified profile likelihood
Monte Carlo modified profile likelihood
Setup and Monte Carlo modified profile likelihood
Logistic regression: simulation studies
Application to a toenail infection study
Setup and background
Weibull model
Simulation studies
Application to an HIV clinical trial
Discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.