Tuesday, January 23, 2018

Instrumental variables

While reading a research paper which delved into an extended statistical discussion of instrumental variables, I realized that I could not provide to myself a clean and articulate explanation of instrumental variables. I have a reasonably instinctual description but I could not articulate the concept in detail.

Of course, that shouldn't be much of a surprise. While I use statistics a lot, it was never a deep field of study academically. Shade tree statistics as it were. It has been nearly forty years since my last class in statistics.

Wikipedia provides the precise description.
In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment. Intuitively, IV is used when an explanatory variable of interest is correlated with the error term. A valid instrument induces changes in the explanatory variable but has no independent effect on the dependent variable, allowing a researcher to uncover the causal effect of the explanatory variable on the dependent variable.

Instrumental variable methods allow for consistent estimation when the explanatory variables (covariates) are correlated with the error terms in a regression model. Such correlation may occur when changes in the dependent variable change the value of at least one of the covariates ("reverse" causation), when there are omitted variables that affect both the dependent and independent variables, or when the covariates are subject to measurement error. Explanatory variables which suffer from one or more of these issues in the context of a regression are sometimes referred to as endogenous. In this situation, ordinary least squares produces biased and inconsistent estimates. However, if an instrument is available, consistent estimates may still be obtained. An instrument is a variable that does not itself belong in the explanatory equation but is correlated with the endogenous explanatory variables, conditional on the value of other covariates. In linear models, there are two main requirements for using IV:
The instrument must be correlated with the endogenous explanatory variables, conditional on the other covariates. If this correlation is strong, then the instrument is said to have a strong first stage. A weak correlation may provide misleading inferences about parameter estimates and standard errors.

The instrument cannot be correlated with the error term in the explanatory equation, conditional on the other covariates. In other words, the instrument cannot suffer from the same problem as the original predicting variable. If this condition is met, then the instrument is said to satisfy the exclusion restriction.
That is a bit dense, as is the the maths that follows. As is often the case, examples get across the idea better.
Informally, in attempting to estimate the causal effect of some variable X on another Y, an instrument is a third variable Z which affects Y only through its effect on X. For example, suppose a researcher wishes to estimate the causal effect of smoking on general health. Correlation between health and smoking does not imply that smoking causes poor health because other variables may affect both health and smoking, or because health may affect smoking. It is at best difficult and expensive to conduct controlled experiments on smoking status in the general population. The researcher may attempt to estimate the causal effect of smoking on health from observational data by using the tax rate for tobacco products as an instrument for smoking. The tax rate for tobacco products is a reasonable choice for an instrument because the researcher assumes that it can only be correlated with health through its effect on smoking. If the researcher then finds tobacco taxes and state of health to be correlated, this may be viewed as evidence that smoking causes changes in health.
There is a good paper, Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments by Joshua D. Angrist and Alan B. Krueger which explores instrumental variables and in particular the opportunity presented by natural experiments where some exogenous event occurs that allows a naturally occurring instrument variable approach.

I don't see natural experiments used much in research literature but they are impressively revealing when they do occur.

No comments:

Post a Comment