Selection Without Exclusion
It is well understood that classical sample selection models are not semiparametrically identified without exclusion restrictions. Lee (2009) developed bounds for the parameters in a model that nests the semiparametric sample selection model. These bounds can be wide. In this paper, we investigate bounds that impose the full structure of a sample selection model with errors that are independent of the explanatory variables but have unknown distribution. The additional structure can significantly reduce the identified set for the parameters of interest. Specifically, we construct the identified set for the parameter vector of interest. It is a one‐dimensional line segment in the parameter space, and we demonstrate that this line segment can be short in practice. We show that the identified set is sharp when the model is correct and empty when there exist no parameter values that make the sample selection model consistent with the data. We also provide non‐sharp bounds under the assumption that the model is correct. These are easier to compute and associated with lower statistical uncertainty than the sharp bounds. Throughout the paper, we illustrate our approach by estimating a standard sample selection model for wages.
- Single Report
1
- 10.21033/wp-2018-10
- Jan 1, 2018
It is well understood that classical sample selection models are not semiparametrically identified without exclusion restrictions. Lee (2009) developed bounds for the parameters in a model that nests the semiparametric sample selection model. These bounds can be wide. In this paper, we investigate bounds that impose the full structure of a sample selection model with errors that are independent of the explanatory variables but have unknown distribution. We find that the additional structure in the classical sample selection model can significantly reduce the identified set for the parameters of interest. Specifically, we construct the identified set for the parameter vector of interest. It is a one-dimensional line-segment in the parameter space, and we demonstrate that this line segment can be short in principle as well as in practice. We show that the identified set is sharp when the model is correct and empty when model is not correct. We also provide non-sharp bounds under the assumption that the model is correct. These are easier to compute and associated with lower statistical uncertainty than the sharp bounds. Throughout the paper, we illustrate our approach by estimating a standard sample selection model for wages.
- Research Article
12
- 10.1007/s11116-022-10312-w
- Nov 13, 2022
- Transportation
Declining survey response rates have increased the costs of travel survey recruitment. Recruiting respondents based on their expressed willingness to participate in future surveys, obtained from a preceding survey, is a potential solution but may exacerbate sample biases. In this study, we analyze the self-selection biases of survey respondents recruited from the 2017 U.S. National Household Travel Survey (NHTS), who had agreed to be contacted again for follow-up surveys. We apply a probit with sample selection (PSS) model to analyze (1) respondents’ willingness to participate in a follow-up survey (the selection model) and (2) their actual response behavior once contacted (the outcome model). Results verify the existence of self-selection biases, which are related to survey burden, sociodemographic characteristics, travel behavior, and item non-response to sensitive variables. We find that age, homeownership, and medical conditions have opposing effects on respondents’ willingness to participate and their actual survey participation. The PSS model is then validated using a hold-out sample and applied to the NHTS samples from various geographic regions to predict follow-up survey participation. Effect size indicators for differences between predicted and actual (population) distributions of select sociodemographic and travel-related variables suggest that the resulting samples may be most biased along age and education dimensions. Further, we summarized six model performance measures based on the PSS model structure. Overall, this study provides insight into self-selection biases in respondents recruited from preceding travel surveys. Model results can help researchers better understand and address such biases, while the nuanced application of various model measures lays a foundation for appropriate comparison across sample selection models.
- Research Article
2
- 10.5555/1466858.1466863
- Mar 1, 2008
- WSEAS Transactions on Mathematics archive
EBT2 films from the lot investigated in this study show response inhomogeneities, which lead to uncertainties in dose determination exceeding the commonly accepted tolerance levels. It is important to test further EBT2 lots regarding homogeneity before using the film in clinical routine.
- Research Article
12
- 10.1111/rssa.12239
- Sep 26, 2016
- Journal of the Royal Statistical Society Series A: Statistics in Society
Summary In sample selection models, a treatment can influence the observed outcome in two ways: by affecting the binary selection or participation decision and by affecting the latent outcome. The former is called the ‘extensive margin effect’, and the latter is called the ‘intensive margin effect’. Despite the popularity of these effects, however, the intensive margin effect does not have the traditional causal parameter interpretation because it is conditioned on the selecting or participating decision, which is a post-treatment variable possibly affected by the treatment. The paper presents a causal framework for sample selection models and introduces various subpopulation effects. It is difficult to separate such effects in general; however, in certain popular models (nearly parametric sample selection models, semiparametric ‘independence models’, semiparametric zero-censored models and ‘polynomial approximation’ models) with linear latent equations, they are separately identified and easily estimable with probit and least squares estimators. An empirical analysis is provided to illustrate these causal effects in sample selection models.
- Research Article
9
- 10.1016/j.jeconom.2021.07.017
- Nov 26, 2022
- Journal of Econometrics
Sample selection models without exclusion restrictions: Parameter heterogeneity and partial identification
- Research Article
- 10.22237/jmasm/1257034680
- Nov 1, 2009
- Journal of Modern Applied Statistical Methods
The sample selection model has been studied in the context of semi-parametric methods. With the deficiencies of the parametric model, such as inconsistent estimators, semi-parametric estimation methods provide better alternatives. This article focuses on the context of fuzzy concepts as a hybrid to the semiparametric sample selection model. The better approach when confronted with uncertainty and ambiguity is to use the tools provided by the theory of fuzzy sets, which are appropriate for modeling vague concepts. A fuzzy membership function for solving uncertainty data of a semi-parametric sample selection model is introduced as a solution to the problem.
- Research Article
36
- 10.1111/rssb.12136
- Nov 20, 2015
- Journal of the Royal Statistical Society Series B: Statistical Methodology
Summary The problem of non-random sample selectivity often occurs in practice in many fields. The classical estimators introduced by Heckman are the backbone of the standard statistical analysis of these models. However, these estimators are very sensitive to small deviations from the distributional assumptions which are often not satisfied in practice. We develop a general framework to study the robustness properties of estimators and tests in sample selection models. We derive the influence function and the change-of-variance function of Heckman's two-stage estimator, and we demonstrate the non-robustness of this estimator and its estimated variance to small deviations from the model assumed. We propose a procedure for robustifying the estimator, prove its asymptotic normality and give its asymptotic variance. Both cases with and without an exclusion restriction are covered. This allows us to construct a simple robust alternative to the sample selection bias test. We illustrate the use of our new methodology in an analysis of ambulatory expenditures and we compare the performance of the classical and robust methods in a Monte Carlo simulation study.
- Research Article
1
- 10.2139/ssrn.984940
- May 9, 2007
- SSRN Electronic Journal
Identification of Multi-Index Sample Selection Models
- Research Article
46
- 10.1109/tip.2009.2036714
- Nov 20, 2009
- IEEE Transactions on Image Processing
The distance between a straight line and a straight line segment in the image space is proposed in this paper. Based on this distance, the neighborhood of a straight line segment is defined and mapped into the parameter space to obtain the parameter space neighborhood of the straight line segment. The neighborhood mapping between the image space and parameter space is a one to one reversible map. The mapped region in the parameter space is analytically derived and it is proved that it can be efficiently approximated by a quadrangle. The proposed straight line segment neighborhood technique for the HT outperforms conventional straight line neighborhood methods currently used with existing HT variations. In contrast to the straight line neighborhoods used in existing HT variations, the proposed straight line segment neighborhood has several advantages including: 1) the detection error of the proposed neighborhood is not affected by the length of the straight line segments; 2) a precision requirement in the image space described using the proposed distance can be explicitly resolved using the proposed formulation; 3) the proposed neighborhood has the ability to distinguish between segments belonging to the same straight line. A variety of experiments are executed to demonstrate that the proposed neighborhood has a variety of interesting properties of high practical value.
- Research Article
48
- 10.1007/s00181-013-0742-1
- Sep 14, 2013
- Empirical Economics
Standard sample selection models with non-randomly censored outcomes assume (i) an exclusion restriction (i.e., a variable affecting selection, but not the outcome) and (ii) additive separability of the errors in the selection process. This paper proposes tests for the joint satisfaction of these assumptions by applying the approach of Huber and Mellace (Testing instrument validity for LATE identification based on inequality moment constraints, 2011) (for testing instrument validity under treatment endogeneity) to the sample selection framework. We show that the exclusion restriction and additive separability imply two testable inequality constraints that come from both point identifying and bounding the outcome distribution of the subpopulation that is always selected/observed. We apply the tests to two variables for which the exclusion restriction is frequently invoked in female wage regressions: non-wife/husband’s income and the number of (young) children. Considering eight empirical applications, our results suggest that the identifying assumptions are likely violated for the former variable, but cannot be refuted for the latter on statistical grounds.
- Research Article
89
- 10.1016/s0165-1765(97)00022-0
- Feb 1, 1997
- Economics Letters
Conditional independence in sample selection models
- Research Article
41
- 10.1016/j.csda.2012.12.010
- Dec 22, 2012
- Computational Statistics & Data Analysis
It is often the case that an outcome of interest is observed for a restricted non-randomly selected sample of the population. In such a situation, standard statistical analysis yields biased results. This issue can be addressed using sample selection models which are based on the estimation of two regressions: a binary selection equation determining whether a particular statistical unit will be available in the outcome equation. Classic sample selection models assume a priori that continuous regressors have a pre-specified linear or non-linear relationship to the outcome, which can lead to erroneous conclusions. In the case of continuous response, methods in which covariate effects are modeled flexibly have been previously proposed, the most recent being based on a Bayesian Markov chain Monte Carlo approach. A frequentist counterpart which has the advantage of being computationally fast is introduced. The proposed algorithm is based on the penalized likelihood estimation framework. The construction of confidence intervals is also discussed. The empirical properties of the existing and proposed methods are studied through a simulation study. The approaches are finally illustrated by analyzing data from the RAND Health Insurance Experiment on annual health expenditures.
- Research Article
29
- 10.18637/jss.v071.i06
- Jan 1, 2016
- Journal of Statistical Software
Sample selection models deal with the situation in which an outcome of interest is observed for a restricted non-randomly selected sample of the population. The estimation of these models is based on a binary equation, which describes the selection process, and an outcome equation, which is used to examine the substantive question of interest. Classic sample selection models assume a priori that continuous covariates have a linear or pre-specified non-linear relationship to the outcome, and that the distribution linking the two equations is bivariate normal. We introduce the R package SemiParSampleSel which implements copula regression spline sample selection models. The proposed implementation can deal with non-random sample selection, non-linear covariate-response relationships, and non-normal bivariate distributions between the model equations. We provide details of the model and algorithm and describe the implementation in SemiParSampleSel. The package is illustrated using simulated and real data examples.
- Research Article
9
- 10.1080/02664763.2020.1780570
- Jun 14, 2020
- Journal of Applied Statistics
The sample selection bias problem occurs when the outcome of interest is only observed according to some selection rule, where there is a dependence structure between the outcome and the selection rule. In a pioneering work, J. Heckman proposed a sample selection model based on a bivariate normal distribution for dealing with this problem. Due to the non-robustness of the normal distribution, many alternatives have been introduced in the literature by assuming extensions of the normal distribution like the Student-t and skew-normal models. One common limitation of the existent sample selection models is that they require a transformation of the outcome of interest, which is common -valued, such as income and wage. With this, data are analyzed on a non-original scale which complicates the interpretation of the parameters. In this paper, we propose a sample selection model based on the bivariate Birnbaum–Saunders distribution, which has the same number of parameters that the classical Heckman model. Further, our associated outcome equation is -valued. We discuss estimation by maximum likelihood and present some Monte Carlo simulation studies. An empirical application to the ambulatory expenditures data from the 2001 Medical Expenditure Panel Survey is presented.
- Research Article
211
- 10.1016/j.jpubeco.2004.03.004
- May 21, 2004
- Journal of Public Economics
School vouchers in practice: competition will not hurt you