Abstract
The main focus of the analysts who deal with clustered data is usually not on the clustering variables, and hence the group-specific parameters are treated as nuisance. If a fixed effects formulation is preferred and the total number of clusters is large relative to the single-group sizes, classical frequentist techniques relying on the profile likelihood are often misleading. The use of alternative tools, such as modifications to the profile likelihood or integrated likelihoods, for making accurate inference on a parameter of interest can be complicated by the presence of nonstandard modelling and/or sampling assumptions. We show here how to employ Monte Carlo simulation in order to approximate the modified profile likelihood in some of these unconventional frameworks. The proposed solution is widely applicable and is shown to retain the usual properties of the modified profile likelihood. The approach is examined in two instances particularly relevant in applications, i.e. missing-data models and survival models with unspecified censoring distribution. The effectiveness of the proposed solution is validated via simulation studies and two clinical trial applications.
Highlights
Clustered data, either cross-sectional or longitudinal observations which may be arranged in groups, are nowadays encountered in all applied areas
If the interest centers on comparing the response variable of units across groups, it is preferable to adopt marginal models, where the clustering structure is ignored for estimation of the regression coefficients and is only employed to ensure correct inference on the standard errors
While the first two categories are associated with an ignorable mechanism of missingness, when data are missing not at random (MNAR) the probability of missing observations depends on values that are unobserved, and the supposed model must take into account the missingness process for providing valid results [24, Section 15.1]
Summary
The lacking registration of some data is the rule rather than the exception in quantitative research analysis. [34] developed the first basic classification of data still in use today: missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). Speaking, in moderately complex models for incomplete datasets, maximization of the log-likelihood function incorporating all the available information is often quite an arduous task. This function, named observed log-likelihood, involves integrals or summations over the distribution of the missing data which can be hardly tractable. It is well-known that the EM algorithm [14] is a possibly advantageous strategy for ML estimation whenever data either are partially not observed or may be viewed as such. It is yet important to point out that, regardless of the selected technique, nonignorable missing-data models need to be fitted with special care because the available information may be insufficient to estimate all parameters [18]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.