Abstract
Statistical models that involve a two-part mixture distribution are applicable in a variety of situations. Frequently, the two parts are a model for the binary response variable and a model for the outcome variable that is conditioned on the binary response. Two common examples are zero-inflated or hurdle models for count data and two-part models for semicontinuous data. Recently, there has been particular interest in the use of these models for the analysis of repeated measures of an outcome variable over time. The aim of this review is to consider motivations for the use of such models in this context and to highlight the central issues that arise with their use. We examine two-part models for semicontinuous and zero-heavy count data, and we also consider models for count data with a two-part random effects distribution.
Highlights
Statistical analysis based on two-part models arises in a variety of contexts
The two parts are a model for the binary response variable and a model for the outcome variable that is conditioned on the binary response
In this article we focus on this specific type of two-part models, as well as models with a comparable two-part structure for a random effects distribution in longitudinal settings
Summary
Statistical analysis based on two-part models arises in a variety of contexts. A simple, but common and useful, version of such models involves a model for a binary indicator variable and a model for another response variable given that the binary indicator takes one of the indicator’s two values. The structure was introduced by Cohen (1963) and given by Johnson & Kotz (1969), but was popularized by Lambert (1992), who provided an excellent introduction with regression formulations These models, the so-called zero-inflated Poisson (ZIP) models and their variants, combine a Poisson (or other distributions for count data) variable with a binary indicator variable for outcome, taking the value zero to accommodate the excess zeros that cannot be captured by the Poisson distribution. Other issues with twopart models, such as the interpretation of regression coefficients, may be even more problematic in the longitudinal settings We address these issues and some approaches to dealing with them, predominately in the context of specific two-part models described in this article, and for other similar models. Some important issues in the use of two-part models in longitudinal settings are highlighted and discussed (Section 8), and two primary examples from studies on psoriatic arthritis (PsA) (Sections 9, 10) and risky sexual behavior among HIV-positive individuals (Section 10) are presented to illustrate the use of the two-part models with particular emphasis on the issues raised
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.