Econometrics II
Department of Economics, WU Vienna
Department of Economics, WU Vienna
December 4, 2025
What are Instrumental Variables
We have discussed why confounders are a source of endogeneity.
Imagine the confounder is unobserved. How can we fend off this threat to identification?
The basic idea is: If we can find a so-called instrumental variable that explains the endogenous regressor, we can use this variable to circumvent the issue. We can also use this technique to deal with other sources of endogeneity.
Intuitively, we can think of the depicted situation like this:
There are two conditions our instrumental variable must fulfill.
Consider the following case where omitting a confounder is the source of endogeneity:
\[ \begin{aligned} \boldsymbol{y} = \boldsymbol{X\beta} + &\boldsymbol{u},\\ & \boldsymbol{u} = \boldsymbol{S\gamma}+\boldsymbol{\varepsilon}, \end{aligned} \]
where \(\mathrm{Cov}(\boldsymbol{X},\boldsymbol{S})\neq 0\), and thus \(\mathrm{Cov}(\boldsymbol{X},\boldsymbol{u})\neq 0\).
We can think of the process as containing two steps:
This is the intuition behind what we will call the Two Stage Least Squares (2SLS) estimator.
Consider the following general model:
\[ \boldsymbol{y} = \boldsymbol{Q\beta}+\boldsymbol{u}, \]
where \(\boldsymbol{Q}=[\boldsymbol{S\:X}]\), with \(\mathrm{Cov}(\boldsymbol{S},\boldsymbol{u})=0\) and \(\mathrm{Cov}(\boldsymbol{X},\boldsymbol{u})\neq 0\), that is,
Assume in addition to that that \(\boldsymbol{Z}\) contains \(M\) instrumental variables.
If \(M \geq K\), we can identify the effect of the endogenous regressors. In that case, there is at least one instrument per endogenous regressor.
The concept behind the 2SLS estimator is similar to before. In the First Stage, we regress the endogenous regressors \(\boldsymbol{X}\) on the exogenous variables \(\boldsymbol{S}\) and the instruments \(\boldsymbol{Z}\).
Assume (for simplicity) that there are no exogenous regressors:
\[ \boldsymbol{X} = \boldsymbol{Z} \boldsymbol{\delta} + \boldsymbol{v}, \qquad \qquad \hat{\boldsymbol{\delta}} = (\boldsymbol{Z}' \boldsymbol{Z})^{-1} \boldsymbol{Z}' \boldsymbol{X}. \]
Using \(\hat{\boldsymbol{\delta}}\), we can now obtain a prediction \(\textcolor{var(--primary-color)}{\boldsymbol{\hat{X}}} = \textcolor{var(--secondary-color)}{\boldsymbol{Z}} \hat{\boldsymbol{\delta}}\) for the next stage.
We can express this in a very simple way using a projection matrix:
\[ \boldsymbol{P_Z} = \boldsymbol{Z} (\boldsymbol{Z}' \boldsymbol{Z})^{-1} \boldsymbol{Z}'. \]
The math behind projection matrixes is out of scope for this class, so we just accept that pre-multiplying a matrix \(\boldsymbol{P}_\boldsymbol{Z}\) of this form yields a variable’s predictions:
\[ \hat{\boldsymbol{X}} = \boldsymbol{P}_\boldsymbol{Z}\boldsymbol{X} \]
Two nice features of projection matrices, which we need for the following derivations, are:
In the Second Stage, we replace the endogenous variables with their prediction \(\boldsymbol{\hat{X}} = \boldsymbol{Z} (\boldsymbol{Z}' \boldsymbol{Z})^{-1} \boldsymbol{Z}'\boldsymbol{X}=\boldsymbol{P_Z} \boldsymbol{X}\). This allows us to obtain the 2SLS estimator:
\[ \begin{aligned} \boldsymbol{y} &= \boldsymbol{\hat{X}} \boldsymbol{\beta} + \boldsymbol{u}, \\ \hat{\boldsymbol{\beta}} &= (\boldsymbol{\hat{X}}' \boldsymbol{\hat{X}})^{-1} \boldsymbol{\hat{X}}' \boldsymbol{y} \\ &= (\boldsymbol{X}' \boldsymbol{P_Z}' \boldsymbol{P_Z} \boldsymbol{X})^{-1} \boldsymbol{X}' \boldsymbol{P_Z}' \boldsymbol{y} \\ &= (\boldsymbol{X}' \boldsymbol{P_Z} \boldsymbol{X})^{-1} \boldsymbol{X}' \boldsymbol{P_Z} \boldsymbol{y} \\ \beta_{2SLS} &= (\boldsymbol{X}' \boldsymbol{P_Z} \boldsymbol{X})^{-1} \boldsymbol{X}' \boldsymbol{P_Z} \boldsymbol{y}. \end{aligned} \]
The covariance matrix of the 2SLS estimator is \(\operatorname{Cov}(\beta_{2SLS}) = \sigma^2 (\boldsymbol{X}' \boldsymbol{P_Z} \boldsymbol{X})^{-1}\).
When the coefficients are just identified (\(M = K\)), the dimensions of \((\boldsymbol{Z}' \boldsymbol{X})^{-1}\) and \(\boldsymbol{Z}' \boldsymbol{y}\) match and we can use the IV estimator1.
\[ \boldsymbol{\beta}_{IV} = (\boldsymbol{Z}' \boldsymbol{X})^{-1} \boldsymbol{Z}' \boldsymbol{y}. \]
We can derive it by pre-multiplying \(\boldsymbol{Z}'\) in the standard model.
\[ \begin{aligned} \boldsymbol{y} &= \boldsymbol{X} \boldsymbol{\beta} + \boldsymbol{u} \\ \boldsymbol{Z}' \boldsymbol{y} &= \boldsymbol{Z}' \boldsymbol{X} \boldsymbol{\beta} + \boldsymbol{Z}' \boldsymbol{u} \end{aligned} \]
Now, we can impose the moment condition \(\boldsymbol{Z}'(\boldsymbol{y}-\boldsymbol{X}\hat{\boldsymbol{\beta}}_{IV})=\boldsymbol{0}\), the sample analog of the exogeneity assumption \(\mathrm{E}(\boldsymbol{Z}'\boldsymbol{u})=0\),
\[ \begin{aligned} \boldsymbol{Z}' \boldsymbol{X} \boldsymbol{\beta}_{IV} &= \boldsymbol{Z}' \boldsymbol{y} \\ \boldsymbol{\beta}_{IV} &= (\boldsymbol{Z}' \boldsymbol{X})^{-1} \boldsymbol{Z}' \boldsymbol{y}. \end{aligned} \]
We can easily sketch a proof for consistency of the IV estimator:
\[ \begin{aligned} \boldsymbol{\beta}_{IV} &= (\boldsymbol{Z}' \boldsymbol{X})^{-1} \boldsymbol{Z}' \boldsymbol{y} \\ &= (\boldsymbol{Z}' \boldsymbol{X})^{-1} \boldsymbol{Z}' \boldsymbol{X} \boldsymbol{\beta} + (\boldsymbol{Z}' \boldsymbol{X})^{-1} \boldsymbol{Z}' \boldsymbol{u} \\ &= \boldsymbol{\beta} + (\boldsymbol{Z}' \boldsymbol{X})^{-1} \boldsymbol{Z}' \boldsymbol{u}\\ &= \boldsymbol{\beta} + (\boldsymbol{Z}' \boldsymbol{X}N^{-1})^{-1} \boldsymbol{Z}' \boldsymbol{u}N^{-1} \end{aligned} \]
From the exogeneity and relevance conditions we get
Thus1, \(\boldsymbol{\beta}_{IV} \xrightarrow{p} \boldsymbol{\beta} + \tfrac{0}{c} = \boldsymbol{\beta}\) as \(N \to \infty\).
The IV estimator is consistent, but almost certainly biased in small samples.
\[ \begin{aligned} \boldsymbol{\beta}_{IV} &= \boldsymbol{\beta} + (\boldsymbol{Z}' \boldsymbol{X})^{-1} \boldsymbol{Z}' \boldsymbol{u}, \\ \mathbb{E}[\boldsymbol{\beta}_{IV}] &= \boldsymbol{\beta} + \mathbb{E}[(\boldsymbol{Z}' \boldsymbol{X})^{-1} \boldsymbol{Z}' \boldsymbol{u}]. \end{aligned} \]
We cannot separate the second term:
\[ \begin{aligned} \mathbb{E}[\boldsymbol{\beta}_{IV}] &= \boldsymbol{\beta} + \mathbb{E}\left[\mathbb{E}\left[(\boldsymbol{Z}' \boldsymbol{X})^{-1} \boldsymbol{Z}' \boldsymbol{u} \mid \boldsymbol{Z}, \boldsymbol{X}\right]\right] \\ &= \mathbb{E}\left[(\boldsymbol{Z}' \boldsymbol{X})^{-1} \boldsymbol{Z}' \mathbb{E}[\boldsymbol{u} \mid \boldsymbol{Z}, \boldsymbol{X}]\right]. \end{aligned} \]
We now know what instrumental variables are, and how we can estimate \(\boldsymbol{\beta}\) in an instrumental variables setting. We know that instruments must be exogenous and relevant, and that we need at least one instrument per endogenous variable. Now consider the following example:
Say we want to find out the effect of education \(\text{X}\) on income \(\text{Y}\).
But we know that both parental education \(\text{PE}\) and parental income \(\text{PI}\) influence the level of education. We could control for these since we can observe them.
However, there are likely other background factors \(\text{BG}\) that influence parental education, education and income. These background factors are unobserved, meaning we cannot identify a causal effect.
Only if we find an instrument \(\text{IV}\), we can bypass this restriction.
This example is from Angrist & Krueger (1991)1. They came up with a novel instrument for education: the quarter of birth of a given individual.
How does this work?
In the United States, students must attend school from the calendar year in which they turn six until their 16th birthday. School entry is once per year, so the length of schooling at age 16 differs, and students who drop out at 16 create variation in education.
Is this instrument both exogenous and relevant?
There is a rule of thumb that a good instrument must seem ridiculous, since it is then likely fulfilling the exclusion restriction.
The length of completed education shows a clear cyclical pattern when plotted against the quarter and year of birth.
Log weekly earnings also show a cyclical pattern.
| Dependent variable: | ||
| log(weekly wage) | ||
| OLS | instrumental | |
| variable | ||
| (1) | (2) | |
| EDUC | 0.0711*** | 0.0891*** |
| (0.0003) | (0.0161) | |
| Observations | 329,509 | 329,509 |
| R2 | 0.1177 | 0.1102 |
| Note: | *p<0.1; **p<0.05; ***p<0.01 | |
In the absence of endogeneity, we prefer OLS. So it makes sense to have a test for endogeneity.
The Durbin-Wu-Hausman Test compares an consistent estimator to a more efficient, potentially inconsistent estimator by following these three steps:
Using this test, we can justify using IV regression, but we cannot assess the quality of our instruments.
How do we know whether our instruments are good? Recall that
\[ \hat{\boldsymbol{\beta}}_{IV} = \boldsymbol{\beta} + (\textcolor{var(--secondary-color)}{\boldsymbol{Z}'\boldsymbol{X}})^{-1}\boldsymbol{Z}'\boldsymbol{u}, \]
where \((\textcolor{var(--secondary-color)}{\boldsymbol{Z}'\boldsymbol{X}})^{-1}\boldsymbol{Z}'\boldsymbol{u}\) should disappear as \(N\rightarrow\infty\).
In such a case of weak instruments, we run into multiple problems:
One approach to find out whether instruments are weak is to check their explanatory power using an F-Test.
If instruments are weak, we can e.g. use Anderson-Rubin Confidence Sets (Anderson & Rubin, 1949), which are robust to weak instruments.
Andrews et al. (2019) provide a good review of weak instruments and how to respond to them.
If we have more instruments than endogenous regressors, we have overidentification.
In a setting of overidentification, we can use Sargan’s \(J\)-test to assess exogeneity of our instruments. The idea is to compare estimates using different instruments:
Unfortunately, we do not learn which instrument is not valid, and estimates could always be similar or different by chance.
| Dependent variable: | ||
| log(weekly wage) | ||
| OLS | instrumental | |
| variable | ||
| (1) | (2) | |
| EDUC | 0.0711*** | 0.0891*** |
| (0.0003) | (0.0161) | |
| Observations | 329,509 | 329,509 |
| R2 | 0.1177 | 0.1102 |
| Note: | *p<0.1; **p<0.05; ***p<0.01 | |
We get an F-statistic of about 4.9 for the case with 30 instruments, which is much lower than the rule-of-thumb cutoff of 10, pointing to that instruments are weak.
Bound et al. (1995) concur with the result of this assessment and go a step further: They randomly generate an irrelevant instrument and show that it leads to similar results.
We get a J-statistic of about 25.4, which does not indicate a violation of exogeneity.
Even so, Buckles & Hungerman (2013) argue that exogeneity may be violated because there is seasonality in mothers’ characteristics. On average, women that give birh in winter are younger, less educated, and less likely to be married; which may affect the income of their children.
Say we want to learn about the way family size affects the labour supply of women — e.g. to better understand discrimination, or to design policies for more equality.
Now consider the fact that mothers whose first two children are of the same gender work fewer hours than others. How is this related to labour supply?
It probably is not related to labor supply. But it may be related to family size since parents may have a preference for mixed genders and choose to have a third kid.
Suppose we want to understand how the price of fish affects the quantity sold at a fish market.
However, on days after a period with especially high waves, prices on the fish market are usually higher. How and why?
When waves are high, it is more difficult to fish, which means that the quantity sold at the fish market will be lower. Note, however, that we need to rely on the assumption that the kind of fish caught is not affected by waves.
Shift-Share Instruments, or Bartik Instruments after Bartik (1991), are instruments that use a national-level shock (the shift) in combination with local shares to instrument for a local shock.
Say we are interested in how immigration \(im\) in some municipality \(m\) affects wages \(y\) in that place (with \(t\) being a time index and \(\boldsymbol{x}\) being a vector of controls):
\[ y_{mt} = im_{mt}\beta + \boldsymbol{x}'\boldsymbol{\gamma} + u_{mt}. \]
The problem with this is that while immigration affects wages, wages likely affect immigration as well. However, national immigration changes are credibly exogenous to local wage changes. We can thus use national-level immigration figures from different countries of origin as the shifts, and initial (at \(t=0\)) shares of different immigrant nationalities \(q=1,\dots,Q\) in the place to construct the Bartik Instrument:
\[ B_{mt} = \sum^Q_{q=1}\textcolor{var(--secondary-color)}{\text{share}}_{mq,t=0}\times\textcolor{var(--primary-color)}{\text{shift}}_{qt} \]
\[ B_{mt} = \sum^Q_{q=1}\textcolor{var(--secondary-color)}{\text{share}}_{mq,t=0}\times\textcolor{var(--primary-color)}{\text{shift}}_{qt} \]
Once we have constructed this instrument, we can use it like any other instrument.
There are two perspectives about what is needed for identification:
Autor et al. (2013) want to find out how competition from Chinese imports affects local labor markets in the U.S. They use an instrument like this:
\[ B_{it} = \sum_{j=1}^{J} \textcolor{var(--secondary-color)}{l_{ijt}}\times \textcolor{var(--primary-color)}{g_{jt}}, \]
where \(i\) are regions, \(t\) is a time index, and \(j\) are industries; \(\textcolor{var(--secondary-color)}{l_{ijt}}\) is the share of people working in (manufacturing) industry \(j\) in region \(j\) at time \(t\) and \(\textcolor{var(--primary-color)}{g_{jt}}\) is the growth of Chinese imports in industry \(j\) in a group of countries that are comparable to the U.S.
Nunn & Qian (2014) investigate the effect of U.S. food aid on conflict in non-OECD countries. To circumvent the endogeneity issue, they use the following instrument (simplified):
\[ B_{it} = \textcolor{var(--secondary-color)}{\overline{D}_{i}} \times \textcolor{var(--primary-color)}{P_{t-1}}, \]
where \(t=1,\dots,T\) are years and \(i=1,\dots,N\) are countries; \(\textcolor{var(--secondary-color)}{\overline{D}_{i}}\) is the share of years in which the country received aid, \(\textcolor{var(--secondary-color)}{\overline{D}_{i}}=T^{-1}\sum_{t=1}^TD_{it}\), and \(\textcolor{var(--primary-color)}{P_{t-1}}\) is U.S. wheat production the previous year.
library(haven)
library(dplyr)
if (!file.exists("NEW7080.dta")) {
if (!file.exists("NEW7080_1.rar"))
download.file("https://economics.mit.edu/sites/default/files/inline-files/NEW7080_1.rar",
"NEW7080_1.rar", mode = "wb")
system("unrar x -y NEW7080_1.rar", ignore.stdout = TRUE)
}
df <- read_dta("NEW7080.dta")
nm <- c("v4"="EDUC","v9"="LWKLYWGE","v16"="CENSUS","v18"="QOB","v27"="YOB")
for (k in names(nm)) if (k %in% names(df)) names(df)[names(df)==k] <- nm[[k]]
df <- df %>%
mutate(AGEQ = ifelse(CENSUS == 80, NA, NA), # placeholder, dropped later
COHORT = ifelse(YOB >= 30 & YOB <= 39, 30, NA)) %>%
filter(COHORT == 30)
# Year-of-birth dummies (YR1930–YR1939)
for (y in 1930:1939) {
df[[paste0("YR", y)]] <- as.integer(df$YOB == (y - 1900))
}
# Quarter-of-birth dummies (QTR1–QTR3; QTR4 base)
for (q in 1:4) df[[paste0("QTR", q)]] <- as.integer(df$QOB == q)
# Interactions QTR1–QTR3 × YR1930–YR1939
for (q in 1:3) for (y in 1930:1939)
df[[paste0("QTR", q, "_", y)]] <- df[[paste0("QTR", q)]] * df[[paste0("YR", y)]]
keep <- c("LWKLYWGE","EDUC",
paste0("YR",1930:1939),
paste0("QTR",1:3),
unlist(lapply(1:3, function(q) paste0("QTR",q,"_",1930:1939))))
df <- df[keep]
write.csv(df, "angrist1991.csv", row.names = FALSE)