Sample Selection Model for Protest Votes in Contingent Valuation Analyses
Sample Selection Model for Protest Votes in Contingent Valuation Analyses
- Research Article
40
- 10.6092/issn.1973-2201/1188
- Jan 1, 2001
- Statistica
Observations with protest votes in contingent valuation surveys could result in biased estimates of the willingness to pay measure, if treated as expressions of genuine indifference to public good or simply removed from the analysis. We propose an elicitation design and a sample selection model that allow correction of the possible bias due to protest votes. An application shows that selection bias can sensibly affect the estimate of the value of the public good. Inference in sample selection models may sometimes be difficult due the flatness of the likelihood function that causes uncertainty about the presence of selection bias. Even so, we recommend use of the sample selection model, since it takes into account this uncertainty.
- Research Article
508
- 10.1016/0304-4076(94)01720-4
- May 1, 1996
- Journal of Econometrics
On the choice between sample selection and two-part models
- Research Article
- 10.1016/s0165-1765(99)00090-7
- Aug 1, 1999
- Economics Letters
Bias in maximum likelihood estimator of disequilibrium and sample selection model with error-ridden observations
- Research Article
48
- 10.1023/a:1012625929384
- Oct 1, 2001
- Environmental and Resource Economics
Modeling households' behavior with the data from a contingentvaluation (CV) survey is often complicated by samplenon-response, which can cause non-response bias and sampleselection bias, leading to inconsistent parameter estimates and adistorted mean willingness-to-pay estimate. This paper reportsthe results of empirical tests for both biases using householdsurvey data in which the double-bounded dichotomous choice CVquestion involved the benefit of a tap water quality improvementpolicy in Korea. No non-response bias, but sample selection bias,is detected in the sample. To correct for sample selection bias,a sample selection model is employed. The authors also discusshow failure to correct for bias may distort aggregate benefitestimates.
- Research Article
6
- 10.5705/ss.202021.0068
- Jan 1, 2023
- Statistica Sinica
Many proposals have emerged as alternatives to the Heckman selection model, mainly to address the non-robustness of its normal assumption. The 2001 Medical Expenditure Panel Survey data is often used to illustrate this non-robustness of the Heckman model. In this paper, we propose a generalization of the Heckman sample selection model by allowing the sample selection bias and dispersion parameters to depend on covariates. We show that the non-robustness of the Heckman model may be due to the assumption of the constant sample selection bias parameter rather than the normality assumption. Our proposed methodology allows us to understand which covariates are important to explain the sample selection bias phenomenon rather than to only form conclusions about its presence. We explore the inferential aspects of the maximum likelihood estimators (MLEs) for our proposed generalized Heckman model. More specifically, we show that this model satisfies some regularity conditions such that it ensures consistency and asymptotic normality of the MLEs. Proper score residuals for sample selection models are provided, and model adequacy is addressed. Simulated results are presented to check the finite-sample behavior of the estimators and to verify the consequences of not considering varying sample selection bias and dispersion parameters. We show that the normal assumption for analyzing medical expenditure data is suitable and that the conclusions drawn using our approach are coherent with findings from prior literature. Moreover, we identify which covariates are relevant to explain the presence of sample selection bias in this important dataset.
- Research Article
12
- 10.1007/s11116-022-10312-w
- Nov 13, 2022
- Transportation
Declining survey response rates have increased the costs of travel survey recruitment. Recruiting respondents based on their expressed willingness to participate in future surveys, obtained from a preceding survey, is a potential solution but may exacerbate sample biases. In this study, we analyze the self-selection biases of survey respondents recruited from the 2017 U.S. National Household Travel Survey (NHTS), who had agreed to be contacted again for follow-up surveys. We apply a probit with sample selection (PSS) model to analyze (1) respondents’ willingness to participate in a follow-up survey (the selection model) and (2) their actual response behavior once contacted (the outcome model). Results verify the existence of self-selection biases, which are related to survey burden, sociodemographic characteristics, travel behavior, and item non-response to sensitive variables. We find that age, homeownership, and medical conditions have opposing effects on respondents’ willingness to participate and their actual survey participation. The PSS model is then validated using a hold-out sample and applied to the NHTS samples from various geographic regions to predict follow-up survey participation. Effect size indicators for differences between predicted and actual (population) distributions of select sociodemographic and travel-related variables suggest that the resulting samples may be most biased along age and education dimensions. Further, we summarized six model performance measures based on the PSS model structure. Overall, this study provides insight into self-selection biases in respondents recruited from preceding travel surveys. Model results can help researchers better understand and address such biases, while the nuanced application of various model measures lays a foundation for appropriate comparison across sample selection models.
- Research Article
119
- 10.1080/01621459.2012.656011
- Mar 1, 2012
- Journal of the American Statistical Association
Sample selection arises often in practice as a result of the partial observability of the outcome of interest in a study. In the presence of sample selection, the observed data do not represent a random sample from the population, even after controlling for explanatory variables. That is, data are missing not at random. Thus, standard analysis using only complete cases will lead to biased results. Heckman introduced a sample selection model to analyze such data and proposed a full maximum likelihood estimation method under the assumption of normality. The method was criticized in the literature because of its sensitivity to the normality assumption. In practice, data, such as income or expenditure data, often violate the normality assumption because of heavier tails. We first establish a new link between sample selection models and recently studied families of extended skew-elliptical distributions. Then, this allows us to introduce a selection-t (SLt) model, which models the error distribution using a Student's t distribution. We study its properties and investigate the finite-sample performance of the maximum likelihood estimators for this model. We compare the performance of the SLt model to the conventional Heckman selection-normal (SLN) model and apply it to analyze ambulatory expenditures. Unlike the SLN model, our analysis using the SLt model provides statistical evidence for the existence of sample selection bias in these data. We also investigate the performance of the test for sample selection bias based on the SLt model and compare it with the performances of several tests used with the SLN model. Our findings indicate that the latter tests can be misleading in the presence of heavy-tailed data.
- Research Article
62
- 10.1016/j.enpol.2019.04.010
- Apr 24, 2019
- Energy Policy
Public perception of new energy vehicles: Evidence from willingness to pay for new energy bus fares in China
- Research Article
28
- 10.1007/s10260-005-0122-x
- Dec 1, 2005
- Statistical Methods and Applications
Several studies have shown that at the individual level there exists a negative relationship between age at first birth and completed fertility. Using twin data in order to control for unobserved heterogeneity as possible source of bias, Kohler et al. (2001) showed the significant presence of such "postponement effect" at the micro level. In this paper, we apply sample selection models, where selection is based on having or not having had a first birth at all, to estimate the impact of postponing first births on subsequent fertility for four European nations, three of which have now lowest-low fertility levels. We use data from a set of comparative surveys (Fertility and Family Surveys), and we apply sample selection models on the logarithm of total fertility and on the progression to the second birth. Our results show that postponement effects are only very slightly affected by sample selection biases, so that sample selection models do not improve significantly the results of standard regression techniques on selected samples. Our results confirm that the postponement effect is higher in countries with lowest-low fertility levels.
- Research Article
6
- 10.1016/j.csda.2021.107382
- Oct 29, 2021
- Computational Statistics & Data Analysis
Correcting for sample selection bias in Bayesian distributional regression models
- Research Article
9
- 10.1016/j.jmva.2022.105097
- Aug 28, 2022
- Journal of Multivariate Analysis
Bivariate symmetric Heckman models and their characterization
- Research Article
5
- 10.2139/ssrn.1275517
- Oct 1, 2008
- SSRN Electronic Journal
Estimation of Sample Selection Models with Spatial Dependence
- Research Article
1
- 10.5282/ubm/epub.1669
- Jan 1, 2002
- Open access LMU (Ludwid Maxmilian's Universitat Munchen)
This paper develops a Bayesian method for estimating and testing the parameters of the endogenous switching regression model and sample selection models. Random coefficients are incorporated in both the decision and regime regression models to reflect heterogeneity across individual units or clusters and correlation of observations within clusters. The case of tobit type regime regression equations are also considered. A combination of Markov chain Monte Carlo methods, data augmentation and Gibbs sampling is used to facilitate computation of Bayes posterior statistics. A simulation study is conducted to compare estimates from full and reduced blocking schemes and to investigate sensitivity to prior information. The Bayesian methodology is applied to data sets on currency hedging and goods trade, cross-country privatisation, and adoption of soil conservation technology. Estimation and inference results on marginal effects, average decision or selection effect as well as model comparison are presented. The expected decision effect is broken down into average effect of individual's decision on the response variable, decision effect due to random components, and differential effect due to latent correlated random components. Application of the proposed Bayesian MCMC algorithm to real data sets reveal that the normality assumption still holds for most commonly encountered economic data.
- Research Article
64
- 10.1007/s40273-014-0210-6
- Sep 5, 2014
- PharmacoEconomics
Marginal analysis evaluates changes in an objective function associated with a unit change in a relevant variable. The primary statistic of marginal analysis is the marginal effect (ME). The ME facilitates the examination of outcomes for defined patient profiles while measuring the change in original units (e.g., costs, probabilities). The ME has a long history in economics; however, it is not widely used in health services research despite its flexibility and ability to provide unique insights. This paper, the first in a two-part series, introduces and illustrates the calculation of the ME for a variety of regression models often used in health services research. Part One includes a review of prior studies discussing MEs, followed by derivation of ME formulas for various regression models including linear, logistic, multinomial logit model (MLM), generalized linear model (GLM) for continuous data, GLM for count data, two-part model, sample selection (two-stage) model, and parametric survival model. Prior theoretical papers in health services research reported the derivation and interpretation of ME primarily for the linear and logistic models, with less emphasis on count models, survival models, MLM, two-part models, and sample selection models. These additional models are relevant for health services research studies examining costs and utilization. Part Two of the series will focus on the methods for estimating and interpreting the ME in applied research. The illustration, discussion, and application of ME in this two-part series support the conduct of future studies applying the marginal concept.
- Research Article
- 10.46306/lb.v5i1.558
- Apr 30, 2024
- Jurnal Lebesgue : Jurnal Ilmiah Pendidikan Matematika, Matematika dan Statistika
The linear regression model is a statistical tool used to model the causal relationship of a dependent variable based on one or several independent or explanatory variables. In scenarios where the dependent variable is a censored variable and there is potential to exist sample selection, the sample selection model can be an alternative in analyzing this relationship. In the Heckman sample selection model, independent variables have the possibility of having an endogeneity effect, where they should be treated as endogenous variables in both the outcome equation and the selection equation instead of as exogenous variables. In result, by including endogenous covariates in the Heckman sample selection model, the sample selection model equation will have more than one equation and makes it a simultaneous equation. To estimate simultaneous equations, simple estimation methods such as the maximum likelihood estimator method are no longer appropriate. In this study, we will discuss the estimation of sample selection models with endogenous covariates utilizing the full information maximum estimator (FIML) approach. The sample selection model with endogenous covariates was then applied to the women labor supply data of Tomas Mroz's research and compared with several models. Based on the MSE and SSE values obtained from the linear regression model, Tobit regression model, Heckman sample selection model, and sample selection model with endogenous covariates, it was concluded that the Heckman sample selection model is the best model that fit the dataset since it yields the best results with the smallest MSE and SSE values