Robust inference for the Two-Sample 2SLS estimator

David Pacini,Frank Windmeijer

doi:10.1016/j.econlet.2016.06.033

Abstract

The Two-Sample Two-Stage Least Squares (TS2SLS) data combination estimator is a popular estimator for the parameters in linear models when not all variables are observed jointly in one single data set. Although the limiting normal distribution has been established, the asymptotic variance formula has only been stated explicitly in the literature for the case of conditional homoskedasticity. By using the fact that the TS2SLS estimator is a function of reduced form and first-stage OLS estimators, we derive the variance of the limiting normal distribution under conditional heteroskedasticity. A robust variance estimator is obtained, which generalises to cases with more general patterns of variable (non-)availability. Stata code and some Monte Carlo results are provided in an Appendix. Stata code for a nonlinear GMM estimator that is identical to the TS2SLS estimator in just identified models and asymptotically equivalent to the TS2SLS estimator in overidentified models is also provided there.

Highlights

The Two-Sample Two-Stage Least Squares (TS2SLS) estimator was introduced by Klevmarken (1982) and applies in cases where one wants to estimate the effects of possibly endogenous explanatory variables x on outcome y, but where y and x are not observed in the same data set
The Two-Sample Two-Stage Least Squares (TS2SLS) data combination estimator is a popular estimator for the parameters in linear models when not all variables are observed jointly in one single data set
By using the fact that the TS2SLS estimator is a function of reduced form and first-stage OLS estimators, we derive the variance of the limiting normal distribution under conditional heteroskedasticity

Summary

Introduction

The Two-Sample Two-Stage Least Squares (TS2SLS) estimator was introduced by Klevmarken (1982) and applies in cases where one wants to estimate the effects of possibly endogenous explanatory variables x on outcome y, but where y and x are not observed in the same data set. The variance of the limiting normal distribution of the TS2SLS estimator is given in (10) below and the formula for a robust estimator of the asymptotic variance is presented in (12) Neither of these have been derived and/or proposed in the literature before. The result in Inoue and Solon (2010) for the conditionally homoskedastic case is similar to our result for that case They derive the limiting variance of the TS2SLS estimator from the optimal nonlinear GMM estimator. Knowledge of πy and Πx1 identifies the structural parameters β, and the standard 2SLS estimator in a sample with y1i, x1i and z1i all observed combines the information contained in the OLS estimators for πy and Πx1, denoted by πy and Π x1 as follows β2sls = Π x′1Z1′ Z1Π x1 −1 Π x′1Z1′ Z1 πy, with Z1 the n1 × kz matrix z1′ i. The result derived below can be seen as a generalisation of this to multiple regressors and overidentified settings

Limiting distribution and variance estimator

Generalising the result