Selection Without Exclusion

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

It is well understood that classical sample selection models are not semiparametrically identified without exclusion restrictions. Lee (2009) developed bounds for the parameters in a model that nests the semiparametric sample selection model. These bounds can be wide. In this paper, we investigate bounds that impose the full structure of a sample selection model with errors that are independent of the explanatory variables but have unknown distribution. We find that the additional structure in the classical sample selection model can significantly reduce the identified set for the parameters of interest. Specifically, we construct the identified set for the parameter vector of interest. It is a one-dimensional line-segment in the parameter space, and we demonstrate that this line segment can be short in principle as well as in practice. We show that the identified set is sharp when the model is correct and empty when model is not correct. We also provide non-sharp bounds under the assumption that the model is correct. These are easier to compute and associated with lower statistical uncertainty than the sharp bounds. Throughout the paper, we illustrate our approach by estimating a standard sample selection model for wages.

Similar Papers
  • Research Article
  • Cite Count Icon 29
  • 10.3982/ecta16481
Selection Without Exclusion
  • Jan 1, 2020
  • Econometrica
  • Bo E Honoré + 1 more

It is well understood that classical sample selection models are not semiparametrically identified without exclusion restrictions. Lee (2009) developed bounds for the parameters in a model that nests the semiparametric sample selection model. These bounds can be wide. In this paper, we investigate bounds that impose the full structure of a sample selection model with errors that are independent of the explanatory variables but have unknown distribution. The additional structure can significantly reduce the identified set for the parameters of interest. Specifically, we construct the identified set for the parameter vector of interest. It is a one‐dimensional line segment in the parameter space, and we demonstrate that this line segment can be short in practice. We show that the identified set is sharp when the model is correct and empty when there exist no parameter values that make the sample selection model consistent with the data. We also provide non‐sharp bounds under the assumption that the model is correct. These are easier to compute and associated with lower statistical uncertainty than the sharp bounds. Throughout the paper, we illustrate our approach by estimating a standard sample selection model for wages.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 12
  • 10.1007/s11116-022-10312-w
Response willingness in consecutive travel surveys: an investigation based on the National Household Travel Survey using a sample selection model
  • Nov 13, 2022
  • Transportation
  • Xinyi Wang + 3 more

Declining survey response rates have increased the costs of travel survey recruitment. Recruiting respondents based on their expressed willingness to participate in future surveys, obtained from a preceding survey, is a potential solution but may exacerbate sample biases. In this study, we analyze the self-selection biases of survey respondents recruited from the 2017 U.S. National Household Travel Survey (NHTS), who had agreed to be contacted again for follow-up surveys. We apply a probit with sample selection (PSS) model to analyze (1) respondents’ willingness to participate in a follow-up survey (the selection model) and (2) their actual response behavior once contacted (the outcome model). Results verify the existence of self-selection biases, which are related to survey burden, sociodemographic characteristics, travel behavior, and item non-response to sensitive variables. We find that age, homeownership, and medical conditions have opposing effects on respondents’ willingness to participate and their actual survey participation. The PSS model is then validated using a hold-out sample and applied to the NHTS samples from various geographic regions to predict follow-up survey participation. Effect size indicators for differences between predicted and actual (population) distributions of select sociodemographic and travel-related variables suggest that the resulting samples may be most biased along age and education dimensions. Further, we summarized six model performance measures based on the PSS model structure. Overall, this study provides insight into self-selection biases in respondents recruited from preceding travel surveys. Model results can help researchers better understand and address such biases, while the nuanced application of various model measures lays a foundation for appropriate comparison across sample selection models.

  • Research Article
  • Cite Count Icon 2
  • 10.5555/1466858.1466863
Fuzzy approach to semi-parametric of a sample selection model
  • Mar 1, 2008
  • WSEAS Transactions on Mathematics archive
  • L Muhamad Safiih + 2 more

EBT2 films from the lot investigated in this study show response inhomogeneities, which lead to uncertainties in dose determination exceeding the commonly accepted tolerance levels. It is important to test further EBT2 lots regarding homogeneity before using the film in clinical routine.

  • Research Article
  • Cite Count Icon 12
  • 10.1111/rssa.12239
Extensive and Intensive Margin Effects in Sample Selection Models: Racial Effects on Wages
  • Sep 26, 2016
  • Journal of the Royal Statistical Society Series A: Statistics in Society
  • Myoung-Jae Lee

Summary In sample selection models, a treatment can influence the observed outcome in two ways: by affecting the binary selection or participation decision and by affecting the latent outcome. The former is called the ‘extensive margin effect’, and the latter is called the ‘intensive margin effect’. Despite the popularity of these effects, however, the intensive margin effect does not have the traditional causal parameter interpretation because it is conditioned on the selecting or participating decision, which is a post-treatment variable possibly affected by the treatment. The paper presents a causal framework for sample selection models and introduces various subpopulation effects. It is difficult to separate such effects in general; however, in certain popular models (nearly parametric sample selection models, semiparametric ‘independence models’, semiparametric zero-censored models and ‘polynomial approximation’ models) with linear latent equations, they are separately identified and easily estimable with probit and least squares estimators. An empirical analysis is provided to illustrate these causal effects in sample selection models.

  • Research Article
  • Cite Count Icon 9
  • 10.1016/j.jeconom.2021.07.017
Sample selection models without exclusion restrictions: Parameter heterogeneity and partial identification
  • Nov 26, 2022
  • Journal of Econometrics
  • Bo E Honoré + 1 more

Sample selection models without exclusion restrictions: Parameter heterogeneity and partial identification

  • Research Article
  • 10.22237/jmasm/1257034680
Semi-Parametric of Sample Selection Model Using Fuzzy Concepts
  • Nov 1, 2009
  • Journal of Modern Applied Statistical Methods
  • L Muhamad Safiih + 2 more

The sample selection model has been studied in the context of semi-parametric methods. With the deficiencies of the parametric model, such as inconsistent estimators, semi-parametric estimation methods provide better alternatives. This article focuses on the context of fuzzy concepts as a hybrid to the semiparametric sample selection model. The better approach when confronted with uncertainty and ambiguity is to use the tools provided by the theory of fuzzy sets, which are appropriate for modeling vague concepts. A fuzzy membership function for solving uncertainty data of a semi-parametric sample selection model is introduced as a solution to the problem.

  • Research Article
  • Cite Count Icon 36
  • 10.1111/rssb.12136
Robust Inference in Sample Selection Models
  • Nov 20, 2015
  • Journal of the Royal Statistical Society Series B: Statistical Methodology
  • Mikhail Zhelonkin + 2 more

Summary The problem of non-random sample selectivity often occurs in practice in many fields. The classical estimators introduced by Heckman are the backbone of the standard statistical analysis of these models. However, these estimators are very sensitive to small deviations from the distributional assumptions which are often not satisfied in practice. We develop a general framework to study the robustness properties of estimators and tests in sample selection models. We derive the influence function and the change-of-variance function of Heckman's two-stage estimator, and we demonstrate the non-robustness of this estimator and its estimated variance to small deviations from the model assumed. We propose a procedure for robustifying the estimator, prove its asymptotic normality and give its asymptotic variance. Both cases with and without an exclusion restriction are covered. This allows us to construct a simple robust alternative to the sample selection bias test. We illustrate the use of our new methodology in an analysis of ambulatory expenditures and we compare the performance of the classical and robust methods in a Monte Carlo simulation study.

  • Research Article
  • Cite Count Icon 1
  • 10.2139/ssrn.984940
Identification of Multi-Index Sample Selection Models
  • May 9, 2007
  • SSRN Electronic Journal
  • Morten Sørensen

Identification of Multi-Index Sample Selection Models

  • Research Article
  • Cite Count Icon 48
  • 10.1007/s00181-013-0742-1
Testing exclusion restrictions and additive separability in sample selection models
  • Sep 14, 2013
  • Empirical Economics
  • Martin Huber + 1 more

Standard sample selection models with non-randomly censored outcomes assume (i) an exclusion restriction (i.e., a variable affecting selection, but not the outcome) and (ii) additive separability of the errors in the selection process. This paper proposes tests for the joint satisfaction of these assumptions by applying the approach of Huber and Mellace (Testing instrument validity for LATE identification based on inequality moment constraints, 2011) (for testing instrument validity under treatment endogeneity) to the sample selection framework. We show that the exclusion restriction and additive separability imply two testable inequality constraints that come from both point identifying and bounding the outcome distribution of the subpopulation that is always selected/observed. We apply the tests to two variables for which the exclusion restriction is frequently invoked in female wage regressions: non-wife/husband’s income and the number of (young) children. Considering eight empirical applications, our results suggest that the identifying assumptions are likely violated for the former variable, but cannot be refuted for the latter on statistical grounds.

  • Research Article
  • Cite Count Icon 89
  • 10.1016/s0165-1765(97)00022-0
Conditional independence in sample selection models
  • Feb 1, 1997
  • Economics Letters
  • Joshua D Angrist

Conditional independence in sample selection models

  • Research Article
  • Cite Count Icon 41
  • 10.1016/j.csda.2012.12.010
Estimation of a regression spline sample selection model
  • Dec 22, 2012
  • Computational Statistics & Data Analysis
  • Giampiero Marra + 1 more

It is often the case that an outcome of interest is observed for a restricted non-randomly selected sample of the population. In such a situation, standard statistical analysis yields biased results. This issue can be addressed using sample selection models which are based on the estimation of two regressions: a binary selection equation determining whether a particular statistical unit will be available in the outcome equation. Classic sample selection models assume a priori that continuous regressors have a pre-specified linear or non-linear relationship to the outcome, which can lead to erroneous conclusions. In the case of continuous response, methods in which covariate effects are modeled flexibly have been previously proposed, the most recent being based on a Bayesian Markov chain Monte Carlo approach. A frequentist counterpart which has the advantage of being computationally fast is introduced. The proposed algorithm is based on the penalized likelihood estimation framework. The construction of confidence intervals is also discussed. The empirical properties of the existing and proposed methods are studied through a simulation study. The approaches are finally illustrated by analyzing data from the RAND Health Insurance Experiment on annual health expenditures.

  • Research Article
  • Cite Count Icon 29
  • 10.18637/jss.v071.i06
Copula Regression Spline Sample Selection Models: TheRPackageSemiParSampleSel
  • Jan 1, 2016
  • Journal of Statistical Software
  • Magorzata Wojty\'S + 2 more

Sample selection models deal with the situation in which an outcome of interest is observed for a restricted non-randomly selected sample of the population. The estimation of these models is based on a binary equation, which describes the selection process, and an outcome equation, which is used to examine the substantive question of interest. Classic sample selection models assume a priori that continuous covariates have a linear or pre-specified non-linear relationship to the outcome, and that the distribution linking the two equations is bivariate normal. We introduce the R package SemiParSampleSel which implements copula regression spline sample selection models. The proposed implementation can deal with non-random sample selection, non-linear covariate-response relationships, and non-normal bivariate distributions between the model equations. We provide details of the model and algorithm and describe the implementation in SemiParSampleSel. The package is illustrated using simulated and real data examples.

  • Research Article
  • Cite Count Icon 9
  • 10.1080/02664763.2020.1780570
Birnbaum–Saunders sample selection model
  • Jun 14, 2020
  • Journal of Applied Statistics
  • Fernando De Souza Bastos + 1 more

The sample selection bias problem occurs when the outcome of interest is only observed according to some selection rule, where there is a dependence structure between the outcome and the selection rule. In a pioneering work, J. Heckman proposed a sample selection model based on a bivariate normal distribution for dealing with this problem. Due to the non-robustness of the normal distribution, many alternatives have been introduced in the literature by assuming extensions of the normal distribution like the Student-t and skew-normal models. One common limitation of the existent sample selection models is that they require a transformation of the outcome of interest, which is common -valued, such as income and wage. With this, data are analyzed on a non-original scale which complicates the interpretation of the parameters. In this paper, we propose a sample selection model based on the bivariate Birnbaum–Saunders distribution, which has the same number of parameters that the classical Heckman model. Further, our associated outcome equation is -valued. We discuss estimation by maximum likelihood and present some Monte Carlo simulation studies. An empirical application to the ambulatory expenditures data from the 2001 Medical Expenditure Panel Survey is presented.

  • Research Article
  • Cite Count Icon 211
  • 10.1016/j.jpubeco.2004.03.004
School vouchers in practice: competition will not hurt you
  • May 21, 2004
  • Journal of Public Economics
  • F.Mikael Sandström + 1 more

School vouchers in practice: competition will not hurt you

  • Research Article
  • Cite Count Icon 19
  • 10.1108/ijm-05-2013-0112
Does intermarriage promote economic assimilation among immigrants in the United States?
  • Oct 5, 2015
  • International Journal of Manpower
  • Miao Chi

Purpose– The purpose of this paper is to investigate whether immigrants in the USA receive an earnings premium associated with marrying a native.Design/methodology/approach– The raw premium revealed by the 2000 US Census data is suspect due to possible endogeneity and selection bias. Instrumental variables estimation, a sample selection model, and a counterfactual construction method are used to address these issues.Findings– Results suggest a positive and modest intermarriage premium, although the magnitude varies with the estimation technique. The evidence is particularly strong for immigrants with high English proficiency, college graduates, and immigrants older than 12 upon arrival in the USA.Originality/value– It is shown that the size of intermarriage premiums varies significantly across different immigrant groups. The empirical results provide insights into the economic assimilation process and mechanisms through which intermarriage influences the labor market outcomes of immigrants.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant