Decision Theory Applied to an Instrumental Variables Model

Gary Chamberlain

doi:10.1111/j.1468-0262.2007.00764.x

Abstract

This paper applies some general concepts in decision theory to a simple instrumental variables model. There are two endogenous variables linked by a single structural equation; k of the exogenous variables are excluded from this structural equation and provide the instrumental variables (IV). The reduced-form distribution of the endogenous variables conditional on the exogenous variables corresponds to independent draws from a bivariate normal distribution with linear regression functions and a known co-variance matrix. A canonical form of the model has parameter vector (p, (o, w), where O is the parameter of interest and is normalized to be a point on the unit circle. The reduced-form coefficients on the instrumental variables are split into a scalar parameter p and a parameter vector ω, which is normalized to be a point on the (k - 1)-dimensional unit sphere; p measures the strength of the association between the endogenous variables and the instrumental variables, and w is a measure of direction. A prior distribution is introduced for the IV model. The parameters O, p, and ω are treated as independent random variables. The distribution for O is uniform on the unit circle; the distribution for w is uniform on the unit sphere with dimension k - 1. These choices arise from the solution of a minimax problem. The prior for p is left general. It turns out that given any positive value for p, the Bayes estimator of O does not depend on p; it equals the maximum-likelihood estimator. This Bayes estimator has constant risk; because it minimizes average risk with respect to a proper prior, it is minimax. The same general concepts are applied to obtain confidence intervals. The prior distribution is used in two ways. The first way is to integrate out the nuisance parameter w in the IV model. This gives an integrated likelihood function with two scalar parameters, O and p. Inverting a likelihood ratio test, based on the integrated likelihood function, provides a confidence interval for O. This lacks finite sample optimality, but invariance arguments show that the risk function depends only on p and not on O or w. The second approach to confidence sets aims for finite sample optimality by setting up a loss function that trades off coverage against the length of the interval. The automatic uniform priors are used for O and ω, but a prior is also needed for the scalar p, and no guidance is offered on this choice. The Bayes rule is a highest posterior density set. Invariance arguments show that the risk function depends only on p and not on O or ω. The optimality result combines average risk and maximum risk. The confidence set minimizes the average-with respect to the prior distribution for p-of the maximum risk, where the maximization is with respect to O andω.

Full Text