This paper considers the estimation problem in linear regression when endogeneity is present, that is, when explanatory variables are correlated with the random error, and also addresses the question of a priori testing for potential endogeneity. We provide a survey of the latent instrumental variables (LIV) approach proposed by Ebbes (2004) and Ebbes et al. (2004, 2005, 2009) and examine its performance compared to the methods of ordinary least squares (OLS) and IV regression. The distinctive feature of Ebbes’ approach is that no observed instruments are required. Instead ‘optimal’ instruments are estimated from data and allow for endogeneity testing. Importantly, this Hausman-LIV test is a simple tool that can be used to test for potential endogeneity in regression analysis and indicate when LIV regression is more appropriate and should be performed instead of OLS regression. The LIV models considered comprise the standard one where the latent variable is discrete with at least two fixed categories and two interesting extensions, multilevel models where a nonparametric Bayes algorithm completely determines the LIV’s distribution from data. This paper suggests that while Ebbes’ new method is a distinct contribution, its formulation is problematic in certain important respects. Specifically the various publications of Ebbes and collaborators employ three distinct and inequivalent statistical concepts exchangeably, treating all as one and the same. We clarify this and then discuss estimation of returns of education in income based on data from three studies that Ebbes (2004) revisited, where ‘education’ is potentially endogenous due to omitted ‘ability.’ While the OLS estimate exhibits a slight upwards bias of 7%, 8%, and 6%, respectively, relative to the LIV estimate for the three studies, IV estimation leads to an enormous bias of 93%, 40%, and -24% when there is no consensus about the direction of the bias. This provides one instance among many well known applications where IVs introduced more substantial biases to the estimated causal effects than OLS, even though IVs were pioneered to overcome the endogeneity problem. In a second example we scrutinize the results of Ferguson et al. (2015) on the estimated effect of campaign expenditures on the proportions of Democratic and Republican votes in US House and Senate elections between 1980 and 2014, where ‘campaign money’ is potentially endogenous in view of omitted variables such as ‘a candidate’s popularity.’ A nonparametric Bayesian spatial LIV regression model was adopted to incorporate identified spatial autocorrelation and account for endogeneity. The relative bias of the spatial regression estimate as compared to the spatial LIV estimate ranges between -17% to 18% for the House and between -25% to 7% for the Senate.
Read full abstract