Econometrics II
Department of Economics, WU Vienna
Department of Economics, WU Vienna
November 6, 2025
In order to assess the quality of causal inferences, it helps to think of the validity of a statistical analysis. Different concepts of validity include the following:
External validity: determines whether an insight can be generalized.
Statistical validity: the validity of an analysis can be thought of as the extent to which the analysis corresponds to the relevant aspects of the real world.
Internal validity: qualify the causal interpretation of an inference.
Statistical validity is the validity of an analysis outside its own context, telling us whether findings can be generalized across situations, people, time, regions etc…
Example
Imagine we study cooperation using a lab experiment with students playing a public goods game. We tightly control the environment: same stakes, same instructions, no distractions. Result: we can confidently say “in this precise setting, people contribute 50% on average.”
What we can we say about external validity?
E.g. Lab experiment on dictator game with WEIRD students
E.g. Meta-Analysis by McKenzie on the effect of training on management practices
E.g. Card & Krueger (1994) Minimum Wage Study: study comparing fast-food restaurants in New Jersey vs. Pennsylvania after New Jersey raised its minimum wage. Contrary to textbook predictions, employment did not fall in New Jersey relative to Pennsylvania.
Example
Sample: For decades, drug trials were conducted only on men (often young, white).
Problem: Findings about dosage, side effects, and efficacy were applied to women and older adults, despite metabolic and hormonal differences.
Solutions?
Example
The effects of studying on academic performance may also be (slightly) affected by:whether you eat breakfast, the type of breakfast, your diet, your social life, the incidence of an armed conflict abroad, a game being published,…
Which one would be relevant?
| Case | Original Insight | What Replication Found | What It Shows About External Validity |
|---|---|---|---|
| Sampson & Cohen (1988) | Proactive policing reduces robbery rates | Similar negative correlation across multiple U.S. cities, with further nuance | Suggests original insight holds across many U.S. cities (some generalizability) |
| Minneapolis Domestic Violence (1981) | Arrest reduces repeat domestic violence | null, opposite or smaller effect sizes, vary by location, method, measurement | Demonstrates strong internal validity in one context does not guarantee generalizability across locations, times, or institutions |
Internal Validity
Internal validity is the validity of an analysis within its own context. It is the extent to which the analysis allows for causal inference.
Ordinary least‐squares (OLS) estimation yields the best, linear, unbiased estimator (BLUE) under the following conditions.
The first four assumptions imply that \(\hat{\boldsymbol{\beta}}\) is unbiased, the last one implies that \(\hat\sigma^2\) is unbiased and, hence, that the estimate is efficient.
\[ \mathrm{plim}_{N \to \infty} \, |\hat{\theta} - \theta| > \varepsilon \, = 0 \]
\[ \mathbb{E}[\boldsymbol{y} \mid \boldsymbol{X}^{*}] - \mathbb{E}[\boldsymbol{y} \mid \boldsymbol{X}] = \beta_1 (\boldsymbol{x}^{*}_1 - \boldsymbol{x}_1) + \big( \mathbb{E}[\boldsymbol{u} \mid \boldsymbol{X}^{*}] - \mathbb{E}[\boldsymbol{u} \mid \boldsymbol{X}] \big). \]
\[ \mathbb{E}[\boldsymbol{y} \mid \boldsymbol{X}^{*}] - \mathbb{E}[\boldsymbol{y} \mid \boldsymbol{X}] = \beta_1 \,(\boldsymbol{x}^{*}_1 - \boldsymbol{x}_1) + \theta_1 \,(\boldsymbol{x}^{*}_1 - \boldsymbol{x}_1). \]
\[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + e \]
What are the implications of estimating \(y=\beta_0 + \beta_1 x_1 + e\) instead ? (Econometrics 1)
Bias from a confounder is also called omitted variable bias it occurs when:
From the previous slide, the bias is given by:\(\mathbb{E}[\hat{\beta_1}]= \beta_1 + \frac{\mathrm{Cov}(x_1,x_2)}{\mathrm{Var}(x_1)}\beta_2\)
Practice task
Imagine a simple OLS with 1 explanatory variable X on Y, what could be the issue in the following case:
The effect of plasma donation centers on crime rates?
The effect of social media on mental health?
To use a proxy variable to identify a causal effect, it must:
Correlate with the omitted variable: \(\theta_1 \neq 0\)
Not correlate with other explanatory variables \(Cov(\boldsymbol{X}, \boldsymbol{e}) = 0\)
have no direct impact on the dependent variable \(Cov(\boldsymbol{z}, \boldsymbol{u}) = 0\)
Condition 1 calls for an edge from the proxy to the confounder, while conditions 2 and 3 imply a lack of other (relevant) edges.
We will revisit another type of proxy variables (Instrumental variables) later.
Examples:
Data may be subject to various issues, due to errors in collection, which may affect our ability to analyse it.
Consider a true \(f\) describing a population of size \(N\), but we only observe \(M (<N)\). Can we learn something using our subset?
We can differentiate between selection bias:
We need to account for endogenous sample selection to guarantee internal validity; exogenous selection limits external validity.
If there seems to be a pattern to missingness we may have to account for it to avoid bias or to benefit from accounting for it.
Outliers are observations that are very different from the rest, and may stem from:
Outliers may have a large impact on estimates, i.e high influence.
For \(\boldsymbol{\beta}_{OLS}\) an influential observation, \(i\), has a combination of high residual \(\boldsymbol{e}_i\) and a high leverage \(\boldsymbol{h}_i = [\boldsymbol{X}(\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}']_{ii}\).
Its influence is given by: \(\boldsymbol{\beta} - \boldsymbol{\beta}_{(i)} = \frac{(\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{x}_{i}'\boldsymbol{e}_i}{1-\boldsymbol{h}_i}\)
Anscombe’s quartet — four different datasets with equal means, variance, and regression lines (Anscombe, 1973).
Consider a weaker version of ignorability of the treatment — we want \(\mathrm{Cov}(\boldsymbol{x}, \boldsymbol{e}) = 0\)
With the measurement error in \(\boldsymbol{x}\), we estimate \(\boldsymbol{y} = \beta \boldsymbol{z} + \boldsymbol{a}\) and find that
\[ \mathrm{Cov}(\boldsymbol{z}, \boldsymbol{a}) = \mathrm{Cov}(\boldsymbol{z}, \boldsymbol{e} - \beta \boldsymbol{u}) = \mathrm{Cov}(\boldsymbol{x} + \boldsymbol{u}, \boldsymbol{e} - \beta \boldsymbol{u}) \neq 0 \]
We may assume:
But \(\mathrm{Cov}(\boldsymbol{u}, -\beta \boldsymbol{u}) = - \beta \mathbb{E}[\boldsymbol{u}^2]\).
Here the bias is given by:
\[ \mathbb{E}[\hat{\beta}] = \beta \frac{\sigma^2_{\boldsymbol{x}}}{\sigma^2_{\boldsymbol{x}} + \sigma^2_{\boldsymbol{u}}} \]
The bias goes toward 0, and reduce the size of the estimates.
We can show the attenuation bias from estimating \(\boldsymbol{y} = \beta \boldsymbol{x} + \boldsymbol{e}\) with \(\boldsymbol{z} = \boldsymbol{x} + \boldsymbol{u}\)
\[ \begin{aligned} \boldsymbol{y} = \beta (\boldsymbol{z} - \boldsymbol{u}) + \boldsymbol{e} &= \beta \boldsymbol{z} + \boldsymbol{e} - \beta \boldsymbol{u} = \beta \boldsymbol{z} + \tilde{\boldsymbol{e}},\\ \hat{\beta} &= (\boldsymbol{z}'\boldsymbol{z})^{-1} \boldsymbol{z}'\boldsymbol{y} = \beta + (\boldsymbol{z}'\boldsymbol{z})^{-1} \boldsymbol{z}'\tilde{\boldsymbol{e}},\\ \hat{\beta} &= \beta + (\boldsymbol{z}'\boldsymbol{z})^{-1} \boldsymbol{z}'\boldsymbol{e} - (\boldsymbol{z}'\boldsymbol{z})^{-1} \boldsymbol{z}'\beta \boldsymbol{u},\\ \hat{\beta} &= \beta + 0 - \beta (\boldsymbol{z}'\boldsymbol{z})^{-1} \boldsymbol{z}'\boldsymbol{u},\\ \hat{\beta} &= \beta - \beta \left[ (\boldsymbol{x} + \boldsymbol{u})'(\boldsymbol{x} + \boldsymbol{u}) \right]^{-1} (\boldsymbol{x} + \boldsymbol{u})'\boldsymbol{u},\\ \mathbb{E}[\hat{\beta}] &= \beta \left( 1 - \frac{\operatorname{Cov}(\boldsymbol{x}, \boldsymbol{u}) + \mathbb{V}(\boldsymbol{u})} {\mathbb{V}(\boldsymbol{x}) + \operatorname{Cov}(\boldsymbol{x}, \boldsymbol{u}) + \mathbb{V}(\boldsymbol{u})} \right), \end{aligned} \]
The causal effect of interest, i.e, \(X \rightarrow Y\), is not always as straightforward as we would like. Instead, we may encounter:
With pure reverse causality, the issue is determining the direction of causation.
Example: Can the civil tribunals’ ineffectiveness in enforcing contracts explain the presence of the Italian Mafia today in Italy? (Braccioli, 2025)
It is not clear whether:
OR
With simulatneity we need to disentangle the effects. Consider the following supply and demand functions, driven by the price \(p\): \[ d = \beta^d p + u^d \]
\[ s = \beta^s p + u^s \]
We can’t observe supply and demand, but, we observed the quantity sold q, at equilibrium (\(q=d=s\)):
\[ q = \beta^d p + e^d = \beta^s p + u^s \]
In this setting it is impossible to differentiate between the effect of price on supply or demand.
To see why the parameters \(\beta^d\) and \(\beta^s\) are unidentified, we can solve for \(p\).
\[\beta^d p + u^d = \beta^s p + u^s,\]
\[\beta^d p = \beta^s p + u^s - u^d, \]
\[\beta^d p - \beta^s p = u^s - u^d, \] \[p(\beta^d - \beta^s) = u^s - u^d, \]
\[p = \frac{u^s - u^d}{\beta^d - \beta^s}.\]
The effect of interest, p, is a function of the errors. Thus, we can’t distinguish the effects. If we regress \(q\) on \(p\), we can’t tell whether the effect stems from the demand or supply.
Consider the following structural equations:
\[ y = \beta_1 z + \beta_2 x_1 + u \]
\[ z = \theta_1 y + \theta_2 x_2 + v \]
We can derive a reduced form equation by solving for z:
\[ z = \gamma_1 x_1 + \gamma_2 x_2 + \varepsilon \]
Where: \[ \gamma_1 = \frac{\theta_1 \beta_2}{1- \theta_1 \beta_2} ; \gamma_2 = \frac{\theta_2}{1- \theta_1 \beta_1} ; \varepsilon = \frac{\theta_1 u+v}{1-\theta_1 \beta_1} \]
The reduced form of our structural parameters, makes two issues clear:
In this reduced form, te error term is
\[ \varepsilon = \frac{\theta_1 u+v}{1-\theta_1\beta_1} \]
where, the correlation between \(\theta_1u\) and the structural regressor \(y\) causes bias in
\[ z = \theta_1 y + \theta_2 x_2 + v \]
Now that we have seen what can go wrong, we will start seeing how we can make it right.
This includes: Instrumental variable models, simultaneous equations models, matching procedures, flexible estimation methods, and quasi-experiments. However, there are many threats to internal validity we did not mention but that are very relevant:
You want to estimate whether time spent walking your dog improves mental health, and you have access to GPS tracker attached to the dog’s collar.
What are the potential issue with the following questions that could arise: