A primer on marginal effects--Part I: Theory and formulae.
Marginal analysis evaluates changes in an objective function associated with a unit change in a relevant variable. The primary statistic of marginal analysis is the marginal effect (ME). The ME facilitates the examination of outcomes for defined patient profiles while measuring the change in original units (e.g., costs, probabilities). The ME has a long history in economics; however, it is not widely used in health services research despite its flexibility and ability to provide unique insights. This paper, the first in a two-part series, introduces and illustrates the calculation of the ME for a variety of regression models often used in health services research. Part One includes a review of prior studies discussing MEs, followed by derivation of ME formulas for various regression models including linear, logistic, multinomial logit model (MLM), generalized linear model (GLM) for continuous data, GLM for count data, two-part model, sample selection (two-stage) model, and parametric survival model. Prior theoretical papers in health services research reported the derivation and interpretation of ME primarily for the linear and logistic models, with less emphasis on count models, survival models, MLM, two-part models, and sample selection models. These additional models are relevant for health services research studies examining costs and utilization. Part Two of the series will focus on the methods for estimating and interpreting the ME in applied research. The illustration, discussion, and application of ME in this two-part series support the conduct of future studies applying the marginal concept.
- Research Article
508
- 10.1016/0304-4076(94)01720-4
- May 1, 1996
- Journal of Econometrics
On the choice between sample selection and two-part models
- Research Article
6
- 10.2139/ssrn.1493156
- Oct 24, 2009
- SSRN Electronic Journal
On Marginal and Interaction Effects: The Case of Heckit and Two-Part Models
- Research Article
1
- 10.5282/ubm/epub.1669
- Jan 1, 2002
- Open access LMU (Ludwid Maxmilian's Universitat Munchen)
This paper develops a Bayesian method for estimating and testing the parameters of the endogenous switching regression model and sample selection models. Random coefficients are incorporated in both the decision and regime regression models to reflect heterogeneity across individual units or clusters and correlation of observations within clusters. The case of tobit type regime regression equations are also considered. A combination of Markov chain Monte Carlo methods, data augmentation and Gibbs sampling is used to facilitate computation of Bayes posterior statistics. A simulation study is conducted to compare estimates from full and reduced blocking schemes and to investigate sensitivity to prior information. The Bayesian methodology is applied to data sets on currency hedging and goods trade, cross-country privatisation, and adoption of soil conservation technology. Estimation and inference results on marginal effects, average decision or selection effect as well as model comparison are presented. The expected decision effect is broken down into average effect of individual's decision on the response variable, decision effect due to random components, and differential effect due to latent correlated random components. Application of the proposed Bayesian MCMC algorithm to real data sets reveal that the normality assumption still holds for most commonly encountered economic data.
- Research Article
75
- 10.1007/s40273-014-0224-0
- Oct 31, 2014
- PharmacoEconomics
Marginal analysis evaluates changes in a regression function associated with a unit change in a relevant variable. The primary statistic of marginal analysis is the marginal effect (ME). The ME facilitates the examination of outcomes for defined patient profiles or individuals while measuring the change in original units (e.g., costs, probabilities). The ME has a long history in economics; however, it is not widely used in health services research despite its flexibility and ability to provide unique insights. This article, the second in a two-part series, discusses practical issues that arise in the estimation and interpretation of the ME for a variety of regression models often used in health services research. Part one provided an overview of prior studies discussing ME followed by derivation of ME formulas for various regression models relevant for health services research studies examining costs and utilization. The current article illustrates the calculation and interpretation of ME in practice and discusses practical issues that arise during the implementation, including: understanding differences between software packages in terms of functionality available for calculating the ME and its confidence interval, interpretation of average marginal effect versus marginal effect at the mean, and the difference between ME and relative effects (e.g., odds ratio). Programming code to calculate ME using SAS, STATA, LIMDEP, and MATLAB are also provided. The illustration, discussion, and application of ME in this two-part series support the conduct of future studies applying the concept of marginal analysis.
- Research Article
12
- 10.1111/rssa.12239
- Sep 26, 2016
- Journal of the Royal Statistical Society Series A: Statistics in Society
Summary In sample selection models, a treatment can influence the observed outcome in two ways: by affecting the binary selection or participation decision and by affecting the latent outcome. The former is called the ‘extensive margin effect’, and the latter is called the ‘intensive margin effect’. Despite the popularity of these effects, however, the intensive margin effect does not have the traditional causal parameter interpretation because it is conditioned on the selecting or participating decision, which is a post-treatment variable possibly affected by the treatment. The paper presents a causal framework for sample selection models and introduces various subpopulation effects. It is difficult to separate such effects in general; however, in certain popular models (nearly parametric sample selection models, semiparametric ‘independence models’, semiparametric zero-censored models and ‘polynomial approximation’ models) with linear latent equations, they are separately identified and easily estimable with probit and least squares estimators. An empirical analysis is provided to illustrate these causal effects in sample selection models.
- Research Article
7
- 10.1007/s10742-020-00211-x
- Jul 29, 2020
- Health Services and Outcomes Research Methodology
In health services research, endogenous healthcare utilization refers to the notion that the choice of utilizing health services is endogenous due to its correlation with the intensity of utilization outcomes, such as the number of emergency room visits. Greene in (Empir Econ 36:133–173, 2009) extended four conventional two-part models for zero-abundant count utilization to the two-part models that account for endogenous utilization. However, statistical inference on the (average) marginal and incremental effects in these models has not been carefully studied. The present article provides the estimation formulations for (average) marginal and incremental effects of the four two-part models: zero-inflated Poisson and negative binomial models, and hurdle Poisson and negative binomial models with correlated errors that characterize endogenous healthcare utilization. The variance estimation derived from the delta method is provided to facilitate the statistical inference of these effects. We then perform simulation studies to numerically justify our methodology. An empirical study is presented to investigate the average effects of household income and health insurance status on healthcare utilization with the German Scocioeconomic Panel data. The four models give consistent results regarding interpretations for moral hazards and adverse selection in the study.
- Research Article
376
- 10.1016/0304-4076(87)90081-9
- May 1, 1987
- Journal of Econometrics
Monte Carlo evidence on the choice between sample selection and two-part models
- Research Article
12
- 10.1007/s11116-022-10312-w
- Nov 13, 2022
- Transportation
Declining survey response rates have increased the costs of travel survey recruitment. Recruiting respondents based on their expressed willingness to participate in future surveys, obtained from a preceding survey, is a potential solution but may exacerbate sample biases. In this study, we analyze the self-selection biases of survey respondents recruited from the 2017 U.S. National Household Travel Survey (NHTS), who had agreed to be contacted again for follow-up surveys. We apply a probit with sample selection (PSS) model to analyze (1) respondents’ willingness to participate in a follow-up survey (the selection model) and (2) their actual response behavior once contacted (the outcome model). Results verify the existence of self-selection biases, which are related to survey burden, sociodemographic characteristics, travel behavior, and item non-response to sensitive variables. We find that age, homeownership, and medical conditions have opposing effects on respondents’ willingness to participate and their actual survey participation. The PSS model is then validated using a hold-out sample and applied to the NHTS samples from various geographic regions to predict follow-up survey participation. Effect size indicators for differences between predicted and actual (population) distributions of select sociodemographic and travel-related variables suggest that the resulting samples may be most biased along age and education dimensions. Further, we summarized six model performance measures based on the PSS model structure. Overall, this study provides insight into self-selection biases in respondents recruited from preceding travel surveys. Model results can help researchers better understand and address such biases, while the nuanced application of various model measures lays a foundation for appropriate comparison across sample selection models.
- Research Article
10
- 10.1080/13504851.2011.628290
- Sep 1, 2012
- Applied Economics Letters
We investigate the socio-economic determinants of alcohol consumption in the United States with a Sample Selection Model (SSM). The dependent variable is log-transformed that facilitates the estimation of the model. In addition, marginal effects of explanatory variables are calculated in both SSM and Two-Part Model (TPM). Our results suggest that the use of proper marginal effect formulae is important, and that the socio-economic variables play important roles in alcohol consumption. The probability of drinking decreases with age, income and education. Men are more likely to drink and drink more than women. Marriage decreases drinking, and drinking are more likely to occur on weekends.
- Research Article
- 10.1016/s0165-1765(99)00090-7
- Aug 1, 1999
- Economics Letters
Bias in maximum likelihood estimator of disequilibrium and sample selection model with error-ridden observations
- Research Article
33
- 10.1177/0022343314528200
- Apr 25, 2014
- Journal of Peace Research
Sample selection models, variants of which are the Heckman and Heckit models, are increasingly used by political scientists to accommodate data in which censoring of the dependent variable raises concerns of sample selectivity bias. Beyond demonstrating several pitfalls in the calculation of marginal effects and associated levels of statistical significance derived from these models, we argue that many of the empirical questions addressed by political scientists would – for both substantive and statistical reasons – be more appropriately addressed using an alternative but closely related procedure referred to as the two-part model (2 PM). Aside from being simple to estimate, one key advantage of the 2 PM is its less onerous identification requirements. Specifically, the model does not require the specification of so-called exclusion restrictions, variables that are included in the selection equation of the Heckit model but omitted from the outcome equation. Moreover, we argue that the interpretation of the marginal effects from the 2 PM, which are in terms of actual outcomes, are more appropriate for the questions typically addressed by political scientists than the potential outcomes ascribed to the Heckit results. Drawing on data from the Correlates of War database, we present an empirical analysis of conflict intensity illustrating that the choice between the sample selection model and 2 PM can bear fundamentally on the conclusions drawn.
- Research Article
4
- 10.1515/jbnst-2013-0104
- Feb 1, 2013
- Jahrbücher für Nationalökonomie und Statistik
Summary Interaction effects capture the impact of one explanatory variable on the marginal effect of another explanatory variable. To explore interaction effects, so-called interaction terms are typically included in estimation specifications. While in linear models the effect of a marginal change in the interaction term is equal to the interaction effect, this equality generally does not hold in non-linear specifications (Ai/Norton 2003). This paper provides for a general derivation of interaction effects in both linear and non-linear models and calculates the formulae of the interaction effects resulting from Heckman’s sample selection model as well as the Two- Part Model, two regression models commonly applied to data with a large fraction of either missing or zero values in the dependent variable. Drawing on a survey of automobile use from Germany, we argue that while it is important to test for the significance of interaction effects, their size conveys limited substantive content. More meaningful, and also more easy to grasp, are the conditional marginal effects pertaining to two variables that are assumed to interact.
- Research Article
5
- 10.2139/ssrn.1990166
- Jan 25, 2012
- SSRN Electronic Journal
On Interaction Effects: The Case of Heckit and Two-Part Models
- Research Article
1
- 10.2139/ssrn.2423259
- Apr 11, 2014
- SSRN Electronic Journal
Is Peace a Missing Value or a Zero?
- Research Article
189
- 10.1016/j.jhealeco.2007.07.001
- Nov 29, 2007
- Journal of Health Economics
Sample selection versus two-part models revisited: The case of female smoking and drinking