Econometrics II
Department of Economics, WU Vienna
Department of Economics, WU Vienna
January 15, 2026
We already know cross-sectional data well. Cross-sectional data covers many individuals at one point in time:
\[ x_i \]
In Econometrics III / Applied Econometrics, we will learn about time series data. Time series data covers one individual at many different points in time:
\[ x_t \]
Today, we will talk briefly about panel data because it is very useful for answering causal questions. In a panel, we follow many individuals over many time periods:
\[ x_{it} \]
Panel data is especially useful because it allows us to control for some unobserved effects without actually observing them.
This is an example of a panel dataset. You can see that we have data on two individuals for two points in time.
| Individual | Date | Income | Age | Education |
|---|---|---|---|---|
| A | 2020 | 1200 | 20 | medium |
| A | 2021 | 1300 | 21 | medium |
| B | 2020 | 1800 | 24 | medium |
| B | 2021 | 2600 | 25 | high |
Panel data is not as uncommon as you might imagine. Examples for panel data include:
Panel data and models have some useful advantages, such as:
However, there also some potential issues, including:
The simplest model can be obtained by stacking cross-sectional models like this:
\[ \begin{aligned} \begin{pmatrix} \mathbf{y}_1 \\ \mathbf{y}_2 \\ \vdots \\ \mathbf{y}_T \end{pmatrix} &= \begin{pmatrix} \mathbf{X}_1 \\ \mathbf{X}_2 \\ \vdots \\ \mathbf{X}_T \end{pmatrix} \begin{pmatrix} \boldsymbol{\beta}_1 \\ \boldsymbol{\beta}_2 \\ \vdots \\ \boldsymbol{\beta}_T \end{pmatrix} + \begin{pmatrix} \mathbf{u}_1 \\ \mathbf{u}_2 \\ \vdots \\ \mathbf{u}_T \end{pmatrix}. \end{aligned} \]
Alternatively, we can write down the model for a single cross-sectional unit like this:
\[ y_{it}=\boldsymbol{x}_{it}'\boldsymbol{\beta} + u_{it}. \]
This is what we call pooled cross-sections. Coefficients are assumed constant across time and individuals.
In this setting, (pooled) OLS is consistent as long as the assumption \(\mathrm{E}(u_{\textcolor{var(--secondary-color)}{it}}\mid x_{\textcolor{var(--secondary-color)}{it}})=0\) holds.
In many cases, this assumption is problematic. Let’s start discussing this by splitting up the error term in two components, time-invariant heterogeneity between individuals \(\mu_i\) and a time-varying error \(\varepsilon_{it}\):
\[ u_{it} = \textcolor{var(--primary-color)}{\mu_i}+\textcolor{var(--secondary-color)}{\varepsilon_{it}}. \]
There are now two problems with the assumption that \(\mathrm{E}(u_{it}\mid x_{it})=0\), which is equivalent to
\[ \mathrm{E}(u_{it}\mid x_{i1},x_{i2},\dots,x_{iT})=0\text{ for } i= 1,2,\dots,N. \]
We can circumvent this problem by considering individual fixed effects explicitly:
\[ y_{it}=\boldsymbol{x}_{it}'\boldsymbol{\beta} + \mu_i+\varepsilon_{it}. \]
This is equivalent to estimating the model with dummy variables for each individual. These fixed effects capture unobserved individual heterogeneity.
There are two things that these estimators cannot do:
If we have panel data, we can use a difference-in-differences (DiD) approach. We divide our data in four and estimate:
\[ y_{i t} = \alpha + x_{\text{after}} \phi + x_{\text{treated}} \theta + x_{\text{interacted}} \delta + \ldots \]
| Before | After | Difference | |
|---|---|---|---|
| Control | \(\alpha\) | \(\alpha + \phi\) | \(\phi\) |
| Treatment | \(\alpha + \theta\) | \(\alpha + \theta + \phi + \delta\) | \(\phi + \delta\) |
| Difference | \(\theta\) | \(\theta + \delta\) | \(\delta\) |
By comparing and evaluating the difference between the two before-after differences (one for the treatment group, and one for the control group), we can directly obtain the treatment effect, \(\hat{\delta}\).
A natural experiment is a study where an experimental setting is induced by nature or other factors outside our control.
U.S. Representative Alexander Pirnie of New York drawing the first capsule in the Vietnam war draft lottery.
In the 1800s, London (as well as many other places) was repeatedly hit by waves of a cholera epidemic.
Of course, running an experiment is infeasible in this context. It would requre randomizing households, and allocating clean water to only a subset of them. This was both logistically infeasible and ethically questionable.
In 1849, the following happened:
One water company moved its pipes further upstream, to a location that incidentally was upstream of the main sewage discharge facility. Suddenly, households in the same neighborhoods had access to different qualities of water.
There were a few other factors that made this situation a natural experiment:
Photograph by Hisgett (2015).
In the end, John Snow collected very convincing evidence for his theory and went on to identify a certain contaminated water pump. The theory, however, was deemed politically unpleasant and was thus not accepted until long after Snow’s death.
A Regression Discontinuity Design (RDD) is another type of quasi-experimental design.
We make use of a sharp cutoff in some runnning variable and compare values immediately below and immediately above the cutoff.
The size of the discontinuity in outcomes gives us the local treatment effect.
RDDs are commonly used where there is some kind of artificial cutoff, e.g. test scores exceeding a minimal threshold for admission to a program. But they are not limited to that.
We (Vashold et al., 2026) made use of a discontinuity in space: Mines pollute water flows, but only in one direction. We found that vegetation is less healthy downstream of a mine.
For an ideal RDD, we need a few things:
In practice, these requirements are hard to check.
A common problem is “fabricating” a discontinuity by overfitting the data to both sides of the cutoff.
In the example on the left, there is obviously no discontinuity – yet we can fit something that makes one appear.
Recall the fundamental problem of causal inference: We cannot observe the counterfactual to our treatment. What we can do is find specific treated observations that are very similar to other untreated observations. We call this procedure matching. It works like this:
This procedure allows us to create a sample with balanced confounders, emulating the balance induced by completely randomized or blocked experiments.
Propensity score matching uses the propensity of being treated for each observation.
Distance matching uses some measure of distance between observations.
Coarsened exact matching sorts variables into different bins.
The basic idea of the synthetic control estimator is that we have a treated unit, and multiple untreated units that do not match the characteristics, or trajectory, of the treated unit. So, what we do is to compute a weighted average of untreated units that matches the treated unit (using a data-driven approach).
In the first study to use this design, Abadie & Gardeazabal (2003) were researching the economic cost of the terrorist activity in the Basque Country in the 1970s.
They construct the synthetic control from other Spanish regions. The mix they end up with is 85.1% Catalonia, 14.9% Madrid, and 0% of all other regions.
Let us look at one more example. Andersson (2019) investigated the effects of a carbon tax and a fuel-specific value added tax on CO₂ emissions from Sweden’s transport sector.
The synthetic control is constructed from other OECD countries. The final mix is 38.4% Denmark, 19.5% Belgium, 17.7% New Zealand, 9% Greece, 8.8% U.S., 6.1% Switzerland, and 0.1% each of Australia, Iceland, and Poland.