Limited Dependent and Sample Selection Models
Issues of nonrandom sampling due to truncation, censoring, or sample selection rules are discussed in the presence of individual-specific effects. Symmetric trimming of sample observations to get rid of incidental parameters are introduced.
- Research Article
507
- 10.1016/0304-4076(94)01720-4
- May 1, 1996
- Journal of Econometrics
On the choice between sample selection and two-part models
- Research Article
12
- 10.1007/s11116-022-10312-w
- Nov 13, 2022
- Transportation
Declining survey response rates have increased the costs of travel survey recruitment. Recruiting respondents based on their expressed willingness to participate in future surveys, obtained from a preceding survey, is a potential solution but may exacerbate sample biases. In this study, we analyze the self-selection biases of survey respondents recruited from the 2017 U.S. National Household Travel Survey (NHTS), who had agreed to be contacted again for follow-up surveys. We apply a probit with sample selection (PSS) model to analyze (1) respondents’ willingness to participate in a follow-up survey (the selection model) and (2) their actual response behavior once contacted (the outcome model). Results verify the existence of self-selection biases, which are related to survey burden, sociodemographic characteristics, travel behavior, and item non-response to sensitive variables. We find that age, homeownership, and medical conditions have opposing effects on respondents’ willingness to participate and their actual survey participation. The PSS model is then validated using a hold-out sample and applied to the NHTS samples from various geographic regions to predict follow-up survey participation. Effect size indicators for differences between predicted and actual (population) distributions of select sociodemographic and travel-related variables suggest that the resulting samples may be most biased along age and education dimensions. Further, we summarized six model performance measures based on the PSS model structure. Overall, this study provides insight into self-selection biases in respondents recruited from preceding travel surveys. Model results can help researchers better understand and address such biases, while the nuanced application of various model measures lays a foundation for appropriate comparison across sample selection models.
- Research Article
- 10.1016/s0165-1765(99)00090-7
- Aug 1, 1999
- Economics Letters
Bias in maximum likelihood estimator of disequilibrium and sample selection model with error-ridden observations
- Research Article
5
- 10.2139/ssrn.1275517
- Oct 1, 2008
- SSRN Electronic Journal
We consider the estimation of sample selection (type II Tobit) models that exhibit spatial error dependence or spatial autoregressive errors (SAE). The method considered is motivated by a two-step strategy analogous to the popular heckit model. The first step of estimation is based on a spatial probit model following a methodology proposed by Pinkse and Slade (1998) that yields consistent estimates. The consistent estimates of the selection equation are used to estimate the inverse Mills ratio (IMR) to be included as a regressor in the estimation of the outcome equation (second step). Since the appropriate IMR turns out to depend on a parameter from the second step under SAE we propose to estimate the two steps jointly within a generalized method of moments (GMM) framework. We explore the finate sample properties of the proposed estimator using a Monte Carlo experiment; discuss the importance of the spatial sample selection model in applied work, and illustrate the application of our method by estimating the spatial production within a fishery with data that is censored for reasons of confidentiality.
- Research Article
1
- 10.5282/ubm/epub.1669
- Jan 1, 2002
- Open access LMU (Ludwid Maxmilian's Universitat Munchen)
This paper develops a Bayesian method for estimating and testing the parameters of the endogenous switching regression model and sample selection models. Random coefficients are incorporated in both the decision and regime regression models to reflect heterogeneity across individual units or clusters and correlation of observations within clusters. The case of tobit type regime regression equations are also considered. A combination of Markov chain Monte Carlo methods, data augmentation and Gibbs sampling is used to facilitate computation of Bayes posterior statistics. A simulation study is conducted to compare estimates from full and reduced blocking schemes and to investigate sensitivity to prior information. The Bayesian methodology is applied to data sets on currency hedging and goods trade, cross-country privatisation, and adoption of soil conservation technology. Estimation and inference results on marginal effects, average decision or selection effect as well as model comparison are presented. The expected decision effect is broken down into average effect of individual's decision on the response variable, decision effect due to random components, and differential effect due to latent correlated random components. Application of the proposed Bayesian MCMC algorithm to real data sets reveal that the normality assumption still holds for most commonly encountered economic data.
- Research Article
63
- 10.1007/s40273-014-0210-6
- Sep 5, 2014
- PharmacoEconomics
Marginal analysis evaluates changes in an objective function associated with a unit change in a relevant variable. The primary statistic of marginal analysis is the marginal effect (ME). The ME facilitates the examination of outcomes for defined patient profiles while measuring the change in original units (e.g., costs, probabilities). The ME has a long history in economics; however, it is not widely used in health services research despite its flexibility and ability to provide unique insights. This paper, the first in a two-part series, introduces and illustrates the calculation of the ME for a variety of regression models often used in health services research. Part One includes a review of prior studies discussing MEs, followed by derivation of ME formulas for various regression models including linear, logistic, multinomial logit model (MLM), generalized linear model (GLM) for continuous data, GLM for count data, two-part model, sample selection (two-stage) model, and parametric survival model. Prior theoretical papers in health services research reported the derivation and interpretation of ME primarily for the linear and logistic models, with less emphasis on count models, survival models, MLM, two-part models, and sample selection models. These additional models are relevant for health services research studies examining costs and utilization. Part Two of the series will focus on the methods for estimating and interpreting the ME in applied research. The illustration, discussion, and application of ME in this two-part series support the conduct of future studies applying the marginal concept.
- Research Article
- 10.46306/lb.v5i1.558
- Apr 30, 2024
- Jurnal Lebesgue : Jurnal Ilmiah Pendidikan Matematika, Matematika dan Statistika
The linear regression model is a statistical tool used to model the causal relationship of a dependent variable based on one or several independent or explanatory variables. In scenarios where the dependent variable is a censored variable and there is potential to exist sample selection, the sample selection model can be an alternative in analyzing this relationship. In the Heckman sample selection model, independent variables have the possibility of having an endogeneity effect, where they should be treated as endogenous variables in both the outcome equation and the selection equation instead of as exogenous variables. In result, by including endogenous covariates in the Heckman sample selection model, the sample selection model equation will have more than one equation and makes it a simultaneous equation. To estimate simultaneous equations, simple estimation methods such as the maximum likelihood estimator method are no longer appropriate. In this study, we will discuss the estimation of sample selection models with endogenous covariates utilizing the full information maximum estimator (FIML) approach. The sample selection model with endogenous covariates was then applied to the women labor supply data of Tomas Mroz's research and compared with several models. Based on the MSE and SSE values obtained from the linear regression model, Tobit regression model, Heckman sample selection model, and sample selection model with endogenous covariates, it was concluded that the Heckman sample selection model is the best model that fit the dataset since it yields the best results with the smallest MSE and SSE values
- Research Article
24
- 10.2139/ssrn.200620
- Jun 6, 2000
- SSRN Electronic Journal
Respondents of contingent valuation surveys may place a null value on the public good, for reasons that differ from a genuine indifference to the good, but that can be interpreted as a protest: either against the interview, or the public management, or both. A good survey design can effectively reduce them, but protest votes can hardly be completely removed from the dataset, and, if there is sample selection bias, they lead to biased estimates for the wtp measure. We propose a survey design, and a sample selection model, that allows taking into account, and correcting, the possible bias due to protest votes. Since the asymptotic standard errors estimated by means of the inverse of the information matrix containing the sample selection parameter are not reliable, we use an alternative procedure based on the likelihood profile. It will be seen that sample selection models may present estimation problems because of the flatness of the likelihood function: in some cases confidence intervals around the sample selection coefficient are too wide to give evidence of presence or absence of sample selection bias. We maintain that even in these circumstances the sample selection model with the protest votes should be preferred to the model without protest votes, since it takes into account the uncertainty about the estimates of the willingness to pay for the public good.
- Research Article
12
- 10.1111/rssa.12239
- Sep 26, 2016
- Journal of the Royal Statistical Society Series A: Statistics in Society
Summary In sample selection models, a treatment can influence the observed outcome in two ways: by affecting the binary selection or participation decision and by affecting the latent outcome. The former is called the ‘extensive margin effect’, and the latter is called the ‘intensive margin effect’. Despite the popularity of these effects, however, the intensive margin effect does not have the traditional causal parameter interpretation because it is conditioned on the selecting or participating decision, which is a post-treatment variable possibly affected by the treatment. The paper presents a causal framework for sample selection models and introduces various subpopulation effects. It is difficult to separate such effects in general; however, in certain popular models (nearly parametric sample selection models, semiparametric ‘independence models’, semiparametric zero-censored models and ‘polynomial approximation’ models) with linear latent equations, they are separately identified and easily estimable with probit and least squares estimators. An empirical analysis is provided to illustrate these causal effects in sample selection models.
- Research Article
29
- 10.3982/ecta16481
- Jan 1, 2020
- Econometrica
It is well understood that classical sample selection models are not semiparametrically identified without exclusion restrictions. Lee (2009) developed bounds for the parameters in a model that nests the semiparametric sample selection model. These bounds can be wide. In this paper, we investigate bounds that impose the full structure of a sample selection model with errors that are independent of the explanatory variables but have unknown distribution. The additional structure can significantly reduce the identified set for the parameters of interest. Specifically, we construct the identified set for the parameter vector of interest. It is a one‐dimensional line segment in the parameter space, and we demonstrate that this line segment can be short in practice. We show that the identified set is sharp when the model is correct and empty when there exist no parameter values that make the sample selection model consistent with the data. We also provide non‐sharp bounds under the assumption that the model is correct. These are easier to compute and associated with lower statistical uncertainty than the sharp bounds. Throughout the paper, we illustrate our approach by estimating a standard sample selection model for wages.
- Research Article
32
- 10.1177/0022343314528200
- Apr 25, 2014
- Journal of Peace Research
Sample selection models, variants of which are the Heckman and Heckit models, are increasingly used by political scientists to accommodate data in which censoring of the dependent variable raises concerns of sample selectivity bias. Beyond demonstrating several pitfalls in the calculation of marginal effects and associated levels of statistical significance derived from these models, we argue that many of the empirical questions addressed by political scientists would – for both substantive and statistical reasons – be more appropriately addressed using an alternative but closely related procedure referred to as the two-part model (2 PM). Aside from being simple to estimate, one key advantage of the 2 PM is its less onerous identification requirements. Specifically, the model does not require the specification of so-called exclusion restrictions, variables that are included in the selection equation of the Heckit model but omitted from the outcome equation. Moreover, we argue that the interpretation of the marginal effects from the 2 PM, which are in terms of actual outcomes, are more appropriate for the questions typically addressed by political scientists than the potential outcomes ascribed to the Heckit results. Drawing on data from the Correlates of War database, we present an empirical analysis of conflict intensity illustrating that the choice between the sample selection model and 2 PM can bear fundamentally on the conclusions drawn.
- Single Report
1
- 10.21033/wp-2018-10
- Jan 1, 2018
It is well understood that classical sample selection models are not semiparametrically identified without exclusion restrictions. Lee (2009) developed bounds for the parameters in a model that nests the semiparametric sample selection model. These bounds can be wide. In this paper, we investigate bounds that impose the full structure of a sample selection model with errors that are independent of the explanatory variables but have unknown distribution. We find that the additional structure in the classical sample selection model can significantly reduce the identified set for the parameters of interest. Specifically, we construct the identified set for the parameter vector of interest. It is a one-dimensional line-segment in the parameter space, and we demonstrate that this line segment can be short in principle as well as in practice. We show that the identified set is sharp when the model is correct and empty when model is not correct. We also provide non-sharp bounds under the assumption that the model is correct. These are easier to compute and associated with lower statistical uncertainty than the sharp bounds. Throughout the paper, we illustrate our approach by estimating a standard sample selection model for wages.
- Research Article
5
- 10.3844/ajassp.2009.1845.1853
- Oct 1, 2009
- American Journal of Applied Sciences
Problem statement: It is well known that, the standard approach to estimating a sample selection models shows an inconsistent estimation results if the distributional assumption are incorrect. Approach: An important progress in the last decade to develop an alternative to overcome the deficiency is through the used of semi-parametric method. However, the usage of semi-parametric approach still does not cover the deficiency of the model. Results: We introduced a fuzzy membership function for solving uncertainty data of a sample selection model and employed method for sample selection models, that is, the two-step estimators to estimate a model of the so-called the self-selection decision. Fuzzy Parametric of Sample Selection Model (FPSSM) is builds as a hybrid to the conventional parametric sample selection model. Conclusion/Recommendations: The result showed that as a whole, the FPSSM give a better estimate and consistent when compared to the Parametric of Sample Selection Model (PSSM). This application demonstrate that the proposed fuzzy modeling approach was quite reasonable and provides an important and significant finding compared with conventional method especially in terms of estimation and consistency.
- Research Article
2
- 10.5555/1466858.1466863
- Mar 1, 2008
- WSEAS Transactions on Mathematics archive
EBT2 films from the lot investigated in this study show response inhomogeneities, which lead to uncertainties in dose determination exceeding the commonly accepted tolerance levels. It is important to test further EBT2 lots regarding homogeneity before using the film in clinical routine.
- Research Article
- 10.1080/03610910802272399
- Sep 23, 2008
- Communications in Statistics - Simulation and Computation
The two-part model and Heckman's sample selection model are often used in economic studies which involve analyzing the demand for limited variables. This study proposed a simultaneous equation model (SEM) and used the expectation-maximization algorithm to obtain the maximum likelihood estimate. We then constructed a simulation to compare the performance of estimates of price elasticity using SEM with those estimates from the two-part model and the sample selection model. The simulation shows that the estimates of price elasticity by SEM are more precise than those by the sample selection model and the two-part model when the model includes limited independent variables. Finally, we analyzed a real example of cigarette consumption as an application. We found an increase in cigarette price associated with a decrease in both the propensity to consume cigarettes and the amount actually consumed.