Module 2: Simple Linear Regression

Econometrics I

Max Heinze (mheinze@wu.ac.at)

Department of Economics, WU Vienna

Based on a slide set by Simon Heß

March 6, 2025

 

 

 

Motivation

The Bivariate Linear Model

An Estimator

Properties of the OLS Estimator

What do these headlines have in common?




Conditional Expectation of \(y\)

The statements on the previous slide all concern the conditional expectation of a dependent variable \(y\), given an explanatory variable \(x\).

  • Some statements are still nonsense.
  • We will learn how to show why.

Conditional expectations are an important measure that relates a dependent variable \(y\) to an explanatory variable \(x\), for example like this:

\[ \mathrm{E}\left(\textcolor{var(--primary-color)}{y}\mid\textcolor{var(--secondary-color)}{x}\right) = 0.4 + 0.5\textcolor{var(--secondary-color)}{x} \]

In this way, we can divide variation in the dependent variable \(y\) into two components:

  • Variation that stems from the explanatory variable \(x\), and
  • Variation that is random or caused by unobserved factors.

Evaluation of Policy Measures

When we evaluate certain measures, we are often interested in understanding differences between different groups.

Two examples:

  • Effects of a drug on patients’ health in a randomized double-blind study
    \[ \mathrm{E}\left(\textcolor{var(--primary-color)}{\mathrm{Health}}\mid\textcolor{var(--secondary-color)}{\mathrm{Drug}=1}\right) - \mathrm{E}\left(\textcolor{var(--primary-color)}{\mathrm{Health}}\mid\textcolor{var(--secondary-color)}{\mathrm{Drug}=0}\right) \]
  • Gender pay gap for a certain education level
    \[ \mathrm{E}\left(\mathrm{log}(\textcolor{var(--primary-color)}{\mathrm{Wage}})\mid\textcolor{var(--secondary-color)}{\mathrm{Male}=1},\dots\right) - \mathrm{E}\left(\mathrm{log}(\textcolor{var(--primary-color)}{\mathrm{Wage}})\mid\textcolor{var(--secondary-color)}{\mathrm{Male}=0},\dots\right) \]

In both cases we are examining the average treatment effect (ATE): the average effect of a “treatment” relative to no “treatment”.

Predictions

We might also be interested in predicting an outcome for a specific initial situation.

Suppose we know the distribution of class size and test scores. For a new district, we only know the class size. What is the best prediction for the test scores in the new district?

  • The conditional mean?
  • The conditional median?
  • The conditional mode?
  • Something else?

If we minimize a quadratic loss function, our best prediction will be the conditional mean.

 

 

Motivation

The Bivariate Linear Model

An Estimator

Properties of the OLS Estimator

Logarithmic Transformations

Conditional Expectation Function

We now want to model the Conditional Expectation Function of a given random variable \(y\) depending on another random variable \(x\).

The simplest way to do that: we assume a linear function.

\[ \mathrm{E}(\textcolor{var(--primary-color)}{y_i}\mid\textcolor{var(--secondary-color)}{x_i}) = \beta_0 + \beta_1 \textcolor{var(--secondary-color)}{x_i}, \]

where

  • \(\beta_0\) and \(\beta_1\) are parameters of the function
  • \(i\) is an index for observations
  • \(\textcolor{var(--primary-color)}{y_i}\) is the dependent variable, explained variable, outcome variable, the regressand
  • \(\textcolor{var(--secondary-color)}{x_i}\) is the explanatory variable, independent variable, the regressor, …

Conditional Expectation Function

\[ \mathrm{E}(\textcolor{var(--primary-color)}{y_i}\mid\textcolor{var(--secondary-color)}{x_i}) = \beta_0 + \beta_1 \textcolor{var(--secondary-color)}{x_i}, \]

This function gives us information about the expected value of \(y_i\) for a given value \(x_i\), and only that.

  • We cannot infer the actual value of \(y_i\) for a specific \(x_i\).
  • We also gain no information about the distribution of \(y_i\) and \(x_i\) beyond the conditional expectation.

Suppose the conditional expectation function for test scores given a certain class size is

\[ \mathrm{E}(\textcolor{var(--primary-color)}{\text{TestScores}_i}\mid\textcolor{var(--secondary-color)}{\text{ClassSize}_i}) = 720 - 0.6 \times \textcolor{var(--secondary-color)}{\text{ClassSize}_i}, \]

Conditional Expectation Function

Suppose the conditional expectation function for test scores given a certain class size is

\[ \mathrm{E}(\textcolor{var(--primary-color)}{\text{TestScores}_i}\mid\textcolor{var(--secondary-color)}{\text{ClassSize}_i}) = 720 - 0.6 \times \textcolor{var(--secondary-color)}{\text{ClassSize}_i}, \]

what can we then say about test scores in a new district with a class size of 20?

  • The expected value for the test scores is 708 points.
  • The actual test scores can be higher or lower:
  • There is some error, or an unobserved component.
  • On average, we expect this error term to have a value of 0: \(u_i := \textcolor{var(--primary-color)}{y_i}-\mathrm{E}(\textcolor{var(--primary-color)}{y_i}\mid\textcolor{var(--secondary-color)}{x_i}) = \textcolor{var(--primary-color)}{y_i}- \beta_0 - \beta_1 \textcolor{var(--secondary-color)}{x_i},\qquad\mathrm{E}(u_i\mid\textcolor{var(--secondary-color)}{x_i})=0.\)
  • We also assume that its expected value is independent of \(x_i\): \(\mathrm{E}(u_i\mid \textcolor{var(--secondary-color}{x_i})=\mathrm{E}(u_i)=0\) (the zero conditional mean assumption).

Visualization of the Conditional Expectation Function

In blue we see ou