Response willingness in consecutive travel surveys: an investigation based on the National Household Travel Survey using a sample selection model
Declining survey response rates have increased the costs of travel survey recruitment. Recruiting respondents based on their expressed willingness to participate in future surveys, obtained from a preceding survey, is a potential solution but may exacerbate sample biases. In this study, we analyze the self-selection biases of survey respondents recruited from the 2017 U.S. National Household Travel Survey (NHTS), who had agreed to be contacted again for follow-up surveys. We apply a probit with sample selection (PSS) model to analyze (1) respondents’ willingness to participate in a follow-up survey (the selection model) and (2) their actual response behavior once contacted (the outcome model). Results verify the existence of self-selection biases, which are related to survey burden, sociodemographic characteristics, travel behavior, and item non-response to sensitive variables. We find that age, homeownership, and medical conditions have opposing effects on respondents’ willingness to participate and their actual survey participation. The PSS model is then validated using a hold-out sample and applied to the NHTS samples from various geographic regions to predict follow-up survey participation. Effect size indicators for differences between predicted and actual (population) distributions of select sociodemographic and travel-related variables suggest that the resulting samples may be most biased along age and education dimensions. Further, we summarized six model performance measures based on the PSS model structure. Overall, this study provides insight into self-selection biases in respondents recruited from preceding travel surveys. Model results can help researchers better understand and address such biases, while the nuanced application of various model measures lays a foundation for appropriate comparison across sample selection models.
- Research Article
508
- 10.1016/0304-4076(94)01720-4
- May 1, 1996
- Journal of Econometrics
On the choice between sample selection and two-part models
- Research Article
- 10.1016/s0165-1765(99)00090-7
- Aug 1, 1999
- Economics Letters
Bias in maximum likelihood estimator of disequilibrium and sample selection model with error-ridden observations
- Research Article
5
- 10.2139/ssrn.1275517
- Oct 1, 2008
- SSRN Electronic Journal
Estimation of Sample Selection Models with Spatial Dependence
- Research Article
1
- 10.5282/ubm/epub.1669
- Jan 1, 2002
- Open access LMU (Ludwid Maxmilian's Universitat Munchen)
This paper develops a Bayesian method for estimating and testing the parameters of the endogenous switching regression model and sample selection models. Random coefficients are incorporated in both the decision and regime regression models to reflect heterogeneity across individual units or clusters and correlation of observations within clusters. The case of tobit type regime regression equations are also considered. A combination of Markov chain Monte Carlo methods, data augmentation and Gibbs sampling is used to facilitate computation of Bayes posterior statistics. A simulation study is conducted to compare estimates from full and reduced blocking schemes and to investigate sensitivity to prior information. The Bayesian methodology is applied to data sets on currency hedging and goods trade, cross-country privatisation, and adoption of soil conservation technology. Estimation and inference results on marginal effects, average decision or selection effect as well as model comparison are presented. The expected decision effect is broken down into average effect of individual's decision on the response variable, decision effect due to random components, and differential effect due to latent correlated random components. Application of the proposed Bayesian MCMC algorithm to real data sets reveal that the normality assumption still holds for most commonly encountered economic data.
- Research Article
64
- 10.1007/s40273-014-0210-6
- Sep 5, 2014
- PharmacoEconomics
Marginal analysis evaluates changes in an objective function associated with a unit change in a relevant variable. The primary statistic of marginal analysis is the marginal effect (ME). The ME facilitates the examination of outcomes for defined patient profiles while measuring the change in original units (e.g., costs, probabilities). The ME has a long history in economics; however, it is not widely used in health services research despite its flexibility and ability to provide unique insights. This paper, the first in a two-part series, introduces and illustrates the calculation of the ME for a variety of regression models often used in health services research. Part One includes a review of prior studies discussing MEs, followed by derivation of ME formulas for various regression models including linear, logistic, multinomial logit model (MLM), generalized linear model (GLM) for continuous data, GLM for count data, two-part model, sample selection (two-stage) model, and parametric survival model. Prior theoretical papers in health services research reported the derivation and interpretation of ME primarily for the linear and logistic models, with less emphasis on count models, survival models, MLM, two-part models, and sample selection models. These additional models are relevant for health services research studies examining costs and utilization. Part Two of the series will focus on the methods for estimating and interpreting the ME in applied research. The illustration, discussion, and application of ME in this two-part series support the conduct of future studies applying the marginal concept.
- Research Article
376
- 10.1016/0304-4076(87)90081-9
- May 1, 1987
- Journal of Econometrics
Monte Carlo evidence on the choice between sample selection and two-part models
- Research Article
3
- 10.3141/2231-05
- Jan 1, 2011
- Transportation Research Record: Journal of the Transportation Research Board
The ability to transfer national travel patterns to a local population is of interest for modeling either large areas that exceed the boundaries of a metropolitan planning organization or small regions with no available travel survey data. At the core of this research are questions about the connection between travel behavior and land use, urban form, and accessibility. To explore this relationship, the researchers selected a group of land use variables to define activity and travel patterns for individuals and households. The 2001 National Household Travel Survey (NHTS) participants were divided into categories consisting of a set of latent cluster models representing persons, travel, and land use. This set was compared with two sets of cluster models constructed for two local travel surveys. Mean statistical tests were compared to assess differences among socio-demographic groups residing in localities with similar land uses. The results showed that the NHTS and the local surveys shared mean population activity and travel characteristics. However, these similarities masked behavioral heterogeneity, which was seen when distributions of activity and travel behavior were examined. Therefore, data from a national household travel survey in combination with land use data cannot be used to model local population travel characteristics if the goal is to model the behavioral distributions and not just mean travel behavior characteristics.
- Research Article
- 10.46306/lb.v5i1.558
- Apr 30, 2024
- Jurnal Lebesgue : Jurnal Ilmiah Pendidikan Matematika, Matematika dan Statistika
The linear regression model is a statistical tool used to model the causal relationship of a dependent variable based on one or several independent or explanatory variables. In scenarios where the dependent variable is a censored variable and there is potential to exist sample selection, the sample selection model can be an alternative in analyzing this relationship. In the Heckman sample selection model, independent variables have the possibility of having an endogeneity effect, where they should be treated as endogenous variables in both the outcome equation and the selection equation instead of as exogenous variables. In result, by including endogenous covariates in the Heckman sample selection model, the sample selection model equation will have more than one equation and makes it a simultaneous equation. To estimate simultaneous equations, simple estimation methods such as the maximum likelihood estimator method are no longer appropriate. In this study, we will discuss the estimation of sample selection models with endogenous covariates utilizing the full information maximum estimator (FIML) approach. The sample selection model with endogenous covariates was then applied to the women labor supply data of Tomas Mroz's research and compared with several models. Based on the MSE and SSE values obtained from the linear regression model, Tobit regression model, Heckman sample selection model, and sample selection model with endogenous covariates, it was concluded that the Heckman sample selection model is the best model that fit the dataset since it yields the best results with the smallest MSE and SSE values
- Research Article
24
- 10.2139/ssrn.200620
- Jun 6, 2000
- SSRN Electronic Journal
Sample Selection Model for Protest Votes in Contingent Valuation Analyses
- Research Article
33
- 10.1177/0022343314528200
- Apr 25, 2014
- Journal of Peace Research
Sample selection models, variants of which are the Heckman and Heckit models, are increasingly used by political scientists to accommodate data in which censoring of the dependent variable raises concerns of sample selectivity bias. Beyond demonstrating several pitfalls in the calculation of marginal effects and associated levels of statistical significance derived from these models, we argue that many of the empirical questions addressed by political scientists would – for both substantive and statistical reasons – be more appropriately addressed using an alternative but closely related procedure referred to as the two-part model (2 PM). Aside from being simple to estimate, one key advantage of the 2 PM is its less onerous identification requirements. Specifically, the model does not require the specification of so-called exclusion restrictions, variables that are included in the selection equation of the Heckit model but omitted from the outcome equation. Moreover, we argue that the interpretation of the marginal effects from the 2 PM, which are in terms of actual outcomes, are more appropriate for the questions typically addressed by political scientists than the potential outcomes ascribed to the Heckit results. Drawing on data from the Correlates of War database, we present an empirical analysis of conflict intensity illustrating that the choice between the sample selection model and 2 PM can bear fundamentally on the conclusions drawn.
- Single Report
3
- 10.2172/991662
- Mar 1, 2010
Policymakers rely on transportation statistics, including data on personal travel behavior, to formulate strategic transportation policies, and to improve the safety and efficiency of the U.S. transportation system. Data on personal travel trends are needed to examine the reliability, efficiency, capacity, and flexibility of the Nation's transportation system to meet current demands and accommodate future demands; to assess the feasibility and efficiency of alternative congestion-alleviating technologies (e.g., high-speed rail, magnetically levitated trains, intelligent vehicle and highway systems); to evaluate the merits of alternative transportation investment programs; and to assess the energy-use and air-quality impacts of various policies. To address these data needs, the U.S. Department of Transportation (USDOT) initiated an effort in 1969 to collect detailed data on personal travel. The 1969 survey was the first Nationwide Personal Transportation Survey (NPTS). The survey was conducted again in 1977, 1983, 1990, 1995, and 2001. Data on daily travel were collected in 1969, 1977, 1983, 1990 and 1995. Longer-distance travel was collected in 1977 and 1995. The 2001 National Household Travel Survey (NHTS) collected both daily and longer-distance trips in one survey. The 2001 survey was sponsored by three USDOT agencies: Federal Highway Administration (FHWA), Bureau of Transportation Statistics (BTS), and National Highway Traffic Safety Administration (NHTSA). The primary objective of the survey was to collect trip-based data on the nature and characteristics of personal travel so that the relationships between the characteristics of personal travel and the demographics of the traveler can be established. Commercial and institutional travel was not part of the survey. New York State participated in the 2001 NHTS by procuring additional 12,000 sample households. These additional sample households allowed New York State to address transportation planning issues pertinent to geographic areas that are significantly smaller than what the national NHTS data allowed. The final sample size for New York State was 13,423 usable households. In this report, Oak Ridge National Laboratory (ORNL) identifies and analyzes differences, if any, in travel patterns that are attributable to demographic characteristics (e.g., gender, age, race and ethnicity), household characteristics (e.g., low income households, zero and one car households), modal characteristics and geographic location. Travel patterns of those who work at home are examined and compared to those of conventional workers, as well as those who do not work. Focus is given to trip frequency, travel by time of day, trip purpose, and mode choice. For example, included in this analysis is the mobility of the elderly population in New York State. The American society is undergoing a major demographic transformation that is resulting in a greater percentage of older individuals in the population. In addition to demographic changes, recent travel surveys show that an increasing number of older individuals are licensed to drive and that they drive more than their same age cohort did a decade ago. Cohort differences in driving are particularly apparent - not only are more of today's elderly population licensed to drive than their age cohort two decades ago, they also drive more. Equally important are the increase in immigration and in racial and cultural diversity. This report also discusses vehicle availability, socioeconomic characteristics, travel trends (e.g., miles travelled, distance driven, commute patterns), and the transportation accessibility of these populations. Specifically, this report addresses in detail the travel behavior of the following special populations: (1) the elderly, defined as those who were 65 years old or older, (2) low-income households, (3) ethnic groups and immigrants, and (4) those who worked at home.
- Research Article
3
- 10.3141/2291-14
- Jan 1, 2012
- Transportation Research Record: Journal of the Transportation Research Board
The National Household Travel Survey (NHTS) provides important information for the development of local and regional models to support decision making related to climate change and sustainability goals. This paper documents the use of NHTS data in the development of the Greenhouse Gas Statewide Transportation Emissions Planning (GreenSTEP) model, which forecasts estimates of greenhouse gas emissions at county and urban area levels. The model was developed to be sensitive to a broad number of policy variables and other factors that were not addressed in existing models. Because there was a lack of local and current sources of information about individuals, households, and their vehicle ownership patterns and travel behavior, GreenSTEP made use of the information in the national sample of the 2001 NHTS to estimate several model modules. The NHTS data were useful specifically in the development of modules on (a) land use characteristics, (b) vehicle ownership, (c) vehicle use [daily vehicle miles traveled (DVMT)], (d) impacts of vehicle travel costs on DVMT, (e) lightweight vehicles (bicycles, mopeds, electric bicycles, etc.), and (f) vehicle fleets (type and age). The NHTS data were particularly important for modeling the adoption and use of limited-range electric vehicles, as the data enabled estimates of trip length distributions to be made. This paper highlights the utility of the NHTS data for this modeling framework, the modifications and augmentations that were necessary, the limitations that were encountered, and the potential for the wider dissemination and use of the GreenSTEP tool because the initial estimation was made with a national sample.
- Research Article
5
- 10.1080/00949655.2011.646277
- Jun 1, 2013
- Journal of Statistical Computation and Simulation
Over a few decades, regression model has received considerable attention and has been shown to be successful when applied together with other models. One of the most successful models is the sample selection model or the selectivity model. However, uncertainties and ambiguities do exist in the models, particularly the relationship between the endogenous and exogenous variables. Therefore, it will disrupt the ability and effectiveness of the model proceeded to give the estimated value that can explain the actual situation of a phenomenon. These are the questions and problems that are yet to be explored and the main aim of this study. A new framework for estimation of the sample selection model using the concept of fuzzy modelling is introduced. In this approach, a flexible fuzzy concept hybrid with the parametric sample selection model is known as fuzzy parametric sample selection model (FPSSM). The elements of vagueness and uncertainty in the models are represented in the model construction, as a way of increasing the available information to produce a more accurate model. This led to the development of the convergence theorem presented in the form of triangular fuzzy numbers to be used in the model. Consistency is an indicator of effectiveness of the developed models and justified using Monte Carlo simulation. Consistency and efficiency of the proposed model are considered in this study. In order to achieve that condition, a Monte Carlo simulation is used. Hence, the error terms of FPSSM are assumed to follow the normal and the chi-square distributions. Simulation results show that FPSSM is consistent and efficient when its distributions are normal. Instead, the FPSSM by chi-square distribution is found to be inconsistent.
- Research Article
89
- 10.1016/s0165-1765(97)00022-0
- Feb 1, 1997
- Economics Letters
Conditional independence in sample selection models
- Research Article
6
- 10.5705/ss.202021.0068
- Jan 1, 2023
- Statistica Sinica
Many proposals have emerged as alternatives to the Heckman selection model, mainly to address the non-robustness of its normal assumption. The 2001 Medical Expenditure Panel Survey data is often used to illustrate this non-robustness of the Heckman model. In this paper, we propose a generalization of the Heckman sample selection model by allowing the sample selection bias and dispersion parameters to depend on covariates. We show that the non-robustness of the Heckman model may be due to the assumption of the constant sample selection bias parameter rather than the normality assumption. Our proposed methodology allows us to understand which covariates are important to explain the sample selection bias phenomenon rather than to only form conclusions about its presence. We explore the inferential aspects of the maximum likelihood estimators (MLEs) for our proposed generalized Heckman model. More specifically, we show that this model satisfies some regularity conditions such that it ensures consistency and asymptotic normality of the MLEs. Proper score residuals for sample selection models are provided, and model adequacy is addressed. Simulated results are presented to check the finite-sample behavior of the estimators and to verify the consequences of not considering varying sample selection bias and dispersion parameters. We show that the normal assumption for analyzing medical expenditure data is suitable and that the conclusions drawn using our approach are coherent with findings from prior literature. Moreover, we identify which covariates are relevant to explain the presence of sample selection bias in this important dataset.