Chapter 75 The Econometrics of Data Combination

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Chapter 75 The Econometrics of Data Combination

Similar Papers
  • Dissertation
  • 10.4225/03/58a52f17f0421
Identification and estimation of microeconometric models
  • Feb 16, 2017
  • Bin Jiang

This thesis studies the identification of discrete choice models, the use of sampling schemes in the finite sample analysis of instrumental variables (IV) estimators, and the estimation of panel data with cross-sectional dependence, which are at the forefront of modern econometrics. These topics, which are extremely closely linked, correspond to three critical steps in econometric research. In fact, the identification analysis is always the primary concern when determining the conditions under which the population of interest can logically be deducted from the information contained in the economic data before estimation procedures are proposed for conducting inference, whilst sampling schemes need to be considered when designing numerical experiments which will ultimately support analytical results. Chapter 1 provides an introduction to the background of identification, sampling schemes in IV estimation and panel data models with factor structures. In addition, we also outline the structure of this thesis. Chapter 2 then reviews and summarizes the existing literature in relevant areas. Chapter 3 revisits the identification problem of the binary response model studied by Chamberlain (2010) from a partial identification perspective. When the support of the predictor variables is bounded, Chamberlain (2010) showed that point identification of this model fails if the distribution of the disturbance is not logistic. Under his setup, we calculate the identified sets for some commonly used non-logistic distributions, adopting a constructive algorithm inspired by Honoré and Tamer (2006). These calculations suggest that Chamberlain's (2010) model restricts the identified sets to be very small in these cases, meaning that the failure of point identification of this model may not be important in practice. Moreover, we find that the extent of the identified sets may be determined by the configuration of the support of the predictor variables and the distribution function form of the disturbance. In particular, we examine the effects of distribution forms of the disturbance on the magnitude of the identified sets by studying the relevant convex hulls defined by Chamberlain (2010) from a geometric point of view. The exact distributions of the classical IV estimators have traditionally been studied based on a sampling scheme in which exogenous variables are kept fixed (e.g., Phillips (1983), and references therein). However, Kiviet and Niemczyk (2007), Kiviet and Niemczyk (2012) and Kiviet (2013) suggested that the existing results indicating disconcerting properties (e.g. bimodality) of the exact IV distributions may be fundamentally influenced by the use of this sampling scheme for the exogenous variables. In Chapter 4, we investigate their claims further in finite samples, focusing on the OLS, two-stage least squares (TSLS) and limited information maximum likelihood (LIML) estimators of the interest parameters in a structural equation model. The exact distributions of these estimators, conditional on the exogenous variables, are compared with their marginal distributions, and evidence is presented to show that the marginal and conditional distributions display the same properties, including bimodality, even for small sample sizes. Thus, the exact IV distributions exhibit the same finite sample properties under alternative sampling schemes. Chapter 5 considers a structural equations model with endogeneity (c.f. Schmidt (1976)) and cross-sectional dependence captured by factor structures. We consider a short term microeconomic panel where T is fixed, with the cross-sectional dependence being captured by various heterogeneous factor structures. The properties of the IV estimators, especially TSLS and LIML, are then investigated in this context. We show that the classical TSLS and LIML estimators are not consistent, due to cross-sectional dependence, and analyze potential remedies to regain this consistency. The transformation suggested by Peng (2013) proves to be an efficient candidate for dealing with cross-sectional dependence, in terms of various factor structures examined in this chapter.

  • Research Article
  • Cite Count Icon 6
  • 10.1002/sim.5904
Regression calibration for models with two predictor variables measured with error and their interaction, using instrumental variables and longitudinal data
  • Jul 30, 2013
  • Statistics in Medicine
  • Matthew Strand + 3 more

Regression calibration provides a way to obtain unbiased estimators of fixed effects in regression models when one or more predictors are measured with error. Recent development of measurement error methods has focused on models that include interaction terms between measured-with-error predictors, and separately, methods for estimation in models that account for correlated data. In this work, we derive explicit and novel forms of regression calibration estimators and associated asymptotic variances for longitudinal models that include interaction terms, when data from instrumental and unbiased surrogate variables are available but not the actual predictors of interest. The longitudinal data are fit using linear mixed models that contain random intercepts and account for serial correlation and unequally spaced observations. The motivating application involves a longitudinal study of exposure to two pollutants (predictors) - outdoor fine particulate matter and cigarette smoke - and their association in interactive form with levels of a biomarker of inflammation, leukotriene E4 (LTE 4 , outcome) in asthmatic children. Because the exposure concentrations could not be directly observed, we used measurements from a fixed outdoor monitor and urinary cotinine concentrations as instrumental variables, and we used concentrations of fine ambient particulate matter and cigarette smoke measured with error by personal monitors as unbiased surrogate variables. We applied the derived regression calibration methods to estimate coefficients of the unobserved predictors and their interaction, allowing for direct comparison of toxicity of the different pollutants. We used simulations to verify accuracy of inferential methods based on asymptotic theory.

  • Research Article
  • Cite Count Icon 58
  • 10.1111/j.1435-5957.2008.00208.x
New spatial econometric techniques and applications in regional science
  • Aug 1, 2008
  • Papers in Regional Science
  • Giuseppe Arbia + 1 more

New spatial econometric techniques and applications in regional science

  • Research Article
  • 10.2139/ssrn.3123490
Instrumental Variable Estimation of Dynamic Linear Panel Data Models with Defactored Regressors and a Multifactor Error Structure
  • Jan 1, 2018
  • SSRN Electronic Journal
  • Milda Norkute + 2 more

This paper develops two instrumental variable (IV) estimators for dynamic panel data models with exogenous covariates and a multifactor error structure when both crosssectional and time series dimensions, N and T respectively, are large. Our approach initially projects out the common factors from the exogenous covariates of the model, and constructs instruments based on this defactored covariates. For models with homogeneous slope coe_cients, we propose a two-step IV estimator: the _rst step IV estimator is obtained using the defactored covariates as instruments. In the second step, the entire model is defactored by the extracted factors from the residuals of the _rst step estimation and subsequently obtain the _nal IV estimator. For models with heterogeneous slope coe _cients, we propose a mean-group type estimator, which is the cross-sectional average of _rst-step IV estimators of cross-section speci_c slopes. It is noteworthy that our estimators do not require us to seek for instrumental variables outside the model. Furthermore, our estimators are linear hence computationally robust and inexpensive. Moreover, they require no bias correction, and they are not subject to the small sample bias of least squares type estimators. The _nite sample performances of the proposed estimators and associated statistical tests are investigated, and the results show that the estimators and the tests perform well even for small N and T.

  • Peer Review Report
  • Cite Count Icon 4
  • 10.7554/elife.64188.sa2
Author response: Mendelian randomization analysis provides causality of smoking on the expression of ACE2, a putative SARS-CoV-2 receptor
  • Apr 13, 2021
  • Hui Liu + 3 more

Background:To understand a causal role of modifiable lifestyle factors in angiotensin-converting enzyme 2 (ACE2) expression (a putative severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2] receptor) across 44 human tissues/organs, and in coronavirus disease 2019 (COVID-19) susceptibility and severity, we conducted a phenome-wide two-sample Mendelian randomization (MR) study.Methods:More than 500 genetic variants were used as instrumental variables to predict smoking and alcohol consumption. Inverse-variance weighted approach was adopted as the primary method to estimate a causal association, while MR-Egger regression, weighted median, and MR pleiotropy residual sum and outlier (MR-PRESSO) were performed to identify potential horizontal pleiotropy.Results:We found that genetically predicted smoking intensity significantly increased ACE2 expression in thyroid (β=1.468, p=1.8×10−8), and increased ACE2 expression in adipose, brain, colon, and liver with nominal significance. Additionally, genetically predicted smoking initiation significantly increased the risk of COVID-19 onset (odds ratio=1.14, p=8.7×10−5). No statistically significant result was observed for alcohol consumption.Conclusions:Our work demonstrates an important role of smoking, measured by both status and intensity, in the susceptibility to COVID-19.Funding:XJ is supported by research grants from the Swedish Research Council (VR-2018–02247) and Swedish Research Council for Health, Working Life and Welfare (FORTE-2020–00884).

  • Peer Review Report
  • 10.7554/elife.64188.sa1
Decision letter: Mendelian randomization analysis provides causality of smoking on the expression of ACE2, a putative SARS-CoV-2 receptor
  • Jan 8, 2021
  • Houfeng Zheng + 1 more

Decision letter: Mendelian randomization analysis provides causality of smoking on the expression of ACE2, a putative SARS-CoV-2 receptor

  • Book Chapter
  • Cite Count Icon 31
  • 10.1007/978-1-4899-1292-3_7
Panel Analysis for Metric Data
  • Jan 1, 1995
  • Cheng Hsiao

A cross-sectional data set refers to observations on a number of individuals at a given time. A time-series data set refers to observations made over time on a given unit. A panel (or longitudinal or temporal cross-sectional) data set follows a number of individuals over time. In recent years empirical studies that use panel data have become common. This is partly because the cost of developing panel or longitudinal data sets is no longer prohibitive. In some cases, computerized matching of existing administrative records can produce inexpensive longitudinal information, such as the Social Security Administration’s Continuous Work History Sample (CWHS). In other cases, valuable longitudinal data bases can be generated by computerized matching of existing administrative and survey data, such as the University of Michigan’s Panel Study of Income Dynamics (PSID) and the U.S. Current Population Survey. Even in cases where the desired longitudinal information can be collected only by initiating new surveys, such as the series of negative income tax experiments in the United States and Canada, the advance of computerized data management systems has made longitudinal data development cost-effective in the last 20 years (Ashenfelter and Solon 1982).

  • Research Article
  • Cite Count Icon 2
  • 10.1111/obes.12453
2SLS and IV Estimation of Dynamic Panel Models with Heterogeneous Trend*
  • Jul 26, 2021
  • Oxford Bulletin of Economics and Statistics
  • Shiyun Cao + 2 more

In this paper, we consider two‐stage least squares (2SLS) and simple instrumental variable (IV) type estimation of dynamic panel data models with both individual‐specific effects and heterogeneous time trend when both N and T tend to infinity. We consider the forward orthogonal deviations (FOD) proposed by (Hayakawa, et al. Econometric Reviews, 2019. Vol. 38, pp. 1055–1088) and the double first difference (2FD) to remove both the individual‐specific effects and heterogeneous trend. As the main theoretical contribution, we establish the asymptotic properties of the 2SLS estimation of the lag coefficient and find that the 2SLS estimation using FOD and optimal 2SLS estimation using 2FD are asymptotically biased of order , while the 2SLS based on 2FD using non‐optimal weighting matrix is asymptotically biased of order . We also establish the asymptotic unbiasedness of the simple IV estimation using first differenced lagged dependent variable as instrument, and establish the invalidity of using level lagged dependent variable as instrument for the simple IV estimation. Monte Carlo simulations confirm our findings in this paper.

  • Research Article
  • 10.2139/ssrn.2965263
Instrumental Variable Estimation of Factor Models with Possibly Many Variables
  • Feb 5, 2018
  • SSRN Electronic Journal
  • Kazuhiko Hayakawa + 1 more

In this paper, we consider the instrumental variables (IV) estimation of factor models. In the psychometrics literature, although the two-stage least squares (2SLS) estimator is routinely used in IV estimation of factor models, alternative estimators have been proposed in the econometrics literature. Therefore, in this paper, we compare the performance of these alternative IV estimators in the context of factor models. Monte Carlo simulation results reveal that the HLIM/HFUL estimator by Hausman, Newey, Woutesen, Chao and Swanson (2012) outperforms the 2SLS estimator and performs best in many cases.

  • Research Article
  • Cite Count Icon 3
  • 10.1080/03610918.2018.1423690
Instrumental variable estimation of factor models with possibly many variables
  • Feb 5, 2018
  • Communications in Statistics - Simulation and Computation
  • Kazuhiko Hayakawa + 1 more

ABSTRACTIn this paper, we consider the instrumental variables (IV) estimation of factor models. In the psychometrics literature, although the two-stage least squares (2SLS) estimator is routinely used in IV estimation of factor models, alternative estimators have been proposed in the econometrics literature. Therefore, in this paper, we compare the performance of these alternative IV estimators in the context of factor models. Monte Carlo simulation results reveal that the HLIM/HFUL estimator by Hausman et al. (2012) outperforms the 2SLS estimator and performs best in many cases.

  • Dissertation
  • 10.31274/rtd-180813-13298
Econometric analysis of measurement error in panel data
  • Mar 2, 2015
  • Elizabeth Martha Paterno

Panel data consist of measurements taken from several individuals over time. Correlation among measurements taken from the same individual are often accounted for using random effect and random coefficient models. Panel data analysis that accounts for measurement error in the explanatory variables has not been thoroughly studied. This dissertation investigates statistical issues associated with two types of measurement error models for panel data;The first paper considers identification and estimation of a random effect model when some explanatory variables are measured with error. Here, individual heterogeneity is assumed to be manifested in intercepts that randomly differ across individuals. Identification of model parameters given the first two moments of observed variables is examined, and relatively unrestrictive sufficient conditions for identification are obtained. Estimation based on maximum normal likelihood is proposed. This method can be easily implemented using available computer packages that perform moment structure analysis. Compared to the only existing procedure based on instrumental variables, the new method is shown to be more efficient and to have much wider applicability. Standard error estimates and goodness-of-fit statistics obtained under the assumption of normally distributed observations are shown to be asymptotically valid for a broad class of non-normal observations. Simulation results demonstrating the efficiency and usefulness of the new procedure are presented;The second paper deals with the random coefficient model with measurement error, where all regression coefficients randomly differ across individuals. Two procedures are proposed for model fitting and estimation. The generalized least squares method is developed for the first two sample moments with a distribution-free estimate of the weight. Since this method tends to yield very variable estimates in small samples, an alternative method, the pseudo maximum normal likelihood procedure is also developed. The latter, obtained by maximizing a hypothetical normal likelihood for the first two sample moments, produces relative stable estimates in most samples. Asymptotic properties of the two procedures are derived and are used to obtain valid standard errors of the estimators. Numerical results showing the finite-sample properties of these estimators are also reported.

  • Research Article
  • 10.1093/ije/dyab168.381
595ICE FALCON: a causation assessment method analogous to, but more powerful than, Mendelian Randomisation
  • Sep 1, 2021
  • International Journal of Epidemiology
  • Shuai Li + 2 more

595ICE FALCON: a causation assessment method analogous to, but more powerful than, Mendelian Randomisation

  • Research Article
  • Cite Count Icon 6
  • 10.2139/ssrn.3642451
IV Estimation of Spatial Dynamic Panels with Interactive Effects: Large Sample Theory and an Application on Bank Attitude Toward Risk
  • Jan 1, 2020
  • SSRN Electronic Journal
  • guowei cui + 2 more

The present paper develops a new Instrumental Variables (IV) estimator for spatial, dynamic panel data models with interactive effects under large N and T asymptotics. For this class of models, the only approaches available in the literature are based on quasi-maximum likelihood estimation. The approach put forward in this paper is appealing from both a theoretical and a practical point of view for a number of reasons. Firstly, the proposed IV estimator is linear in the parameters of interest and it is computationally inexpensive. Secondly, the IV estimator is free from asymptotic bias. In contrast, existing QML estimators suffer from incidental parameter bias, depending on the magnitude of unknown parameters. Thirdly, the IV estimator retains the attractive feature of Method of Moments estimation in that it can accommodate endogenous regressors, so long as external exogenous instruments are available. The IV estimator is consistent and asymptotically normal as N, T go to infinity, with N/T^2 going to 0 and T/N^2 tending to 0. The proposed methodology is employed to study the determinants of risk attitude of banking institutions. The results of our analysis provide evidence that the more risk-sensitive capital regulation that was introduced by the Basel III framework in 2011 has succeeded in influencing banks’ behavior in a substantial manner.

  • Research Article
  • Cite Count Icon 4
  • 10.1097/ede.0000000000001697
Comparative Analysis of Instrumental Variables on the Assignment of Buprenorphine/Naloxone or Methadone for the Treatment of Opioid Use Disorder.
  • Jan 30, 2023
  • Epidemiology (Cambridge, Mass.)
  • Fahmida Homayra + 19 more

Instrumental variable (IV) analysis provides an alternative set of identification assumptions in the presence of uncontrolled confounding when attempting to estimate causal effects. Our objective was to evaluate the suitability of measures of prescriber preference and calendar time as potential IVs to evaluate the comparative effectiveness of buprenorphine/naloxone versus methadone for treatment of opioid use disorder (OUD). Using linked population-level health administrative data, we constructed five IVs: prescribing preference at the individual, facility, and region levels (continuous and categorical variables), calendar time, and a binary prescriber's preference IV in analyzing the treatment assignment-treatment discontinuation association using both incident-user and prevalent-new-user designs. Using published guidelines, we assessed and compared each IV according to the four assumptions for IVs, employing both empirical assessment and content expertise. We evaluated the robustness of results using sensitivity analyses. The study sample included 35,904 incident users (43.3% on buprenorphine/naloxone) initiated on opioid agonist treatment by 1585 prescribers during the study period. While all candidate IVs were strong (A1) according to conventional criteria, by expert opinion, we found no evidence against assumptions of exclusion (A2), independence (A3), monotonicity (A4a), and homogeneity (A4b) for prescribing preference-based IV. Some criteria were violated for the calendar time-based IV. We determined that preference in provider-level prescribing, measured on a continuous scale, was the most suitable IV for comparative effectiveness of buprenorphine/naloxone and methadone for the treatment of OUD. Our results suggest that prescriber's preference measures are suitable IVs in comparative effectiveness studies of treatment for OUD.

  • Book Chapter
  • Cite Count Icon 12
  • 10.1007/978-1-4757-2642-8_11
A Practical Comparison of Modeling Approaches for Panel Data
  • Jan 1, 1997
  • Mark Bradley

In contrast to cross-sectional data, panel data provide us with the ability to directly observe and model changes in behavior resulting from changes in causal variables. Explicitly modeling change should allow more accurate predictions, at least for the short term. Model estimation using panel data, however, requires us to sort out various types of within-person and cross-sectional effects, both observed and unobserved. This chapter begins with a discussion of the types of variables one might expect and their treatment in model estimation. Although quite complex statistical methods are required to isolate all types of cross-sectional and dynamic effects in panel data, a number of relatively simple model forms can be used to incorporate at least some aspects of dynamic transitions in choice behavior. These simple dynamic models are compared to their static counterparts in the context of a commuter “before and after” panel study in the Netherlands. The models are compared in terms of the estimation results and, more importantly, in terms of the predictions they provide in a dynamic application framework. The chapter thus provides an illustration and discussion of the use of dynamic panel models in forecasting, an issue which is rarely treated in the practical literature.KeywordsTravel TimePanel DataMode ChoiceRail LineNest Logit ModelThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.