Econometrics I
slide set by Max Heinze (mheinze@wu.ac.at)
March 6, 2025
Welcome to the course Econometrics I!
In this course, we will explore how we can use data to provide evidence for hypotheses and find answers to the questions we pose.
To do this, we need solid mathematical and statistical basics. Broadly speaking, these are things already covered in school and repeated in the statistics and mathematics lectures in the CBK.
This self-study slide set is meant to help you refresh these basics. If something is unclear, there will be enough time to ask questions in the course, but you should have a basic understanding of this slide set.
The slide set also contains an introduction to R with interactive code examples.
At various points in this course, not least in assignments, we will need software to perform statistical calculations. Which software you use is up to you. Our recommendation is R, as it is what we usually discuss example code in.
A comfortable way to use R is with the integrated development environment RStudio. RStudio is the interface we use to write code in R; R itself is a separate program that executes our code and provides results.
Installing R and RStudio
An installation guide and the download for R can be found at cran.r-project.org.
An installation guide and the download for RStudio can be found at posit.co/download/rstudio-desktop/
Having statistical software (e.g., R) installed is a prerequisite for the course.
Introduction to R with RStudio
The default layout is slightly different, and the default theme is light. Both can be changed in the settings.
A typical RStudio window looks like this:
1+1
and press Enter to confirm.
Base R comes with many useful functions, but sometimes we will need functions for specific econometric purposes that are not included in Base R. These are often included in packages created by developers and then (ideally) published in the Comprehensive R Archive Network (CRAN). We can install these additional packages as follows:
The above functions install, load, and update packages (Note: install.packages()
requires quotation marks). We can access a function’s documentation with a ?
:
Functions can be recognized by the parentheses. They may or may not contain arguments.
With <-
(or =
), we can assign a value to a variable name.
This can either be a scalar, a vector, or something else (more on that later).
We can print a variable with print()
. It is also sufficient to just write the variable name (without print()
).
The code on the right is interactive and can be modified and executed.
We can also use R as a calculator. The interactive code on the right demonstrates various mathematical operations.
We can also define matrices and perform calculations with them.
Of course, R is primarily a statistical programming language. So let us simulate 100 dice rolls:
Execute this code so we can proceed.
What is the mean of the rolls?
We can also calculate other measures:
Other functions include median()
, min()
, max()
, length()
, var()
, sd()
, sum()
, …
We use different functions depending on the file format to read in data: read.csv()
for CSV, readRDS()
for RDS, …
Some datasets are already available in R, which is especially convenient for practice purposes. One example is mtcars
. We can view the first rows using head()
.
When reading data, e.g., as CSV, we must assign it to a name, e.g., through my_data <- read.csv("data.csv")
. Incidentally, we can export data in a similar way: write.csv(my_data, "my_data.csv")
.
The structure in which data is stored is called a dataframe. The rows of a dataframe correspond to individual observations, and the columns correspond to variables. With View()
, we can view the dataset in a separate window. We can also find out, e.g., the number of columns and rows:
Using square brackets, we can access specific rows and columns. mtcars[1,]
is the first row of mtcars
, mtcars[,1]
is the first column. We can also access individual variables using the following notation: mtcars$mpg
.
What happens when we execute this code?
We can also use TRUE
and FALSE
to filter values:
We can also combine multiple functions:
Or draw graphs:
Often, we want to draw a scatterplot to show the relationship between two variables.
Try adding the line abline(lm(mpg~hp, data=mtcars), col="red")
to draw a red regression line on the plot!
How strongly are mpg
and hp
correlated?
We denote matrices with bold uppercase letters (\(\boldsymbol{X}\)) and (column) vectors with bold lowercase letters (\(\boldsymbol{x}\)):
\[ \boldsymbol{X} = \begin{pmatrix} x_{11} & x_{12} & \dots & x_{1k} \\ x_{21} & \dots & \dots & x_{2k} \\ \vdots & \vdots & \ddots & \vdots \\ x_{n1} & x_{n2} & \dots & x_{nk} \\ \end{pmatrix} ,\quad \boldsymbol{x} = \begin{pmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \\ \end{pmatrix} \]
Bold is not strictly necessary. If we want to clarify in handwriting that something is a matrix or vector, we can instead underline them: \(\underline{X}\) or \(\underline{x}\).
On the previous slide, \(\boldsymbol{X}\) had dimensions \(n\times k\), and \(\boldsymbol{x}\) had dimension \(n\). \(\boldsymbol{x}\) was written as a column vector, but we can also write \(\boldsymbol{x}\) as a row vector. To do this, we must transpose the vector: Simply put, rows become columns, and columns become rows. We denote the transposition with a small apostrophe (or optionally a superscript T):
\[ \boldsymbol{x}' = (x_1, x_2, \dots, x_n) \]
We can also transpose matrices. A matrix that previously had dimensions \(n\times k\) will have dimensions \(k\times n\) after transposition.
If we transpose a transposed matrix, we get:
\[ (\boldsymbol{X}')' = \boldsymbol{X} \]
Furthermore, \((\boldsymbol{XZ})'=\textcolor{red}{\boldsymbol{Z}'\boldsymbol{X}'}\).
In econometrics, we encounter various special matrices:
The rank of a matrix is defined as the dimension of the vector space spanned by the columns of a matrix and is denoted by \(\mathrm{rank}(\boldsymbol{X})\). Simply put, the rank of a matrix corresponds to the number of its linearly independent columns. A column is linearly independent of the others if it cannot be expressed as a linear combination of them (i.e., as a sum of multiples of the other columns). Consider the following matrix:
\[ \boldsymbol{X} = \begin{pmatrix} 12 & 2 & 10 \\ 3 & 1 & 2 \\ 7 & 4 & 3 \\ 8 & 6 & 2 \end{pmatrix} \]
This matrix has rank 2. It has three columns, but the third column is a linear combination of the first two: \(x_{i3} = x_{i1} + (-1) \cdot x_{i2}\).
If a matrix has the maximum possible rank for its dimensions, it is said to have full rank.
Matrix addition occurs element by element:
\[ \small \begin{pmatrix} x_{11} & x_{12} & \cdots & x_{1k} \\ x_{21} & x_{22} & \cdots & x_{2k} \\ \vdots & \vdots & \ddots & \vdots \\ x_{n1} & x_{n2} & \cdots & x_{nk} \end{pmatrix} + \begin{pmatrix} z_{11} & z_{12} & \cdots & z_{1k} \\ z_{21} & z_{22} & \cdots & z_{2k} \\ \vdots & \vdots & \ddots & \vdots \\ z_{n1} & z_{n2} & \cdots & z_{nk} \end{pmatrix} = \begin{pmatrix} x_{11} + z_{11} & x_{12} + z_{12} & \cdots & x_{1k} + z_{1k} \\ x_{21} + z_{21} & x_{22} + z_{22} & \cdots & x_{2k} + z_{2k} \\ \vdots & \vdots & \ddots & \vdots \\ x_{n1} + z_{n1} & x_{n2} + z_{n2} & \cdots & x_{nk} + z_{nk} \end{pmatrix} \]
Multiplication of a matrix by a scalar also occurs element by element:
\[ \small \alpha \cdot \begin{pmatrix} x_{11} & x_{12} & \cdots & x_{1k} \\ x_{21} & x_{22} & \cdots & x_{2k} \\ \vdots & \vdots & \ddots & \vdots \\ x_{n1} & x_{n2} & \cdots & x_{nk} \end{pmatrix} = \begin{pmatrix} \alpha x_{11} & \alpha x_{12} & \cdots & \alpha x_{1k} \\ \alpha x_{21} & \alpha x_{22} & \cdots & \alpha x_{2k} \\ \vdots & \vdots & \ddots & \vdots \\ \alpha x_{n1} & \alpha x_{n2} & \cdots & \alpha x_{nk} \end{pmatrix} \]
The multiplication of two matrices is somewhat more complex. Let \(\boldsymbol{X}\) be a \(2\times 3\) matrix and \(\boldsymbol{Z}\) be a \(3\times 2\) matrix. Then we can multiply the matrices as follows:
\[ \small \begin{pmatrix} \textcolor{red}{x_{11}} & \textcolor{red}{x_{12}} \\ \textcolor{orange}{x_{21}} & \textcolor{orange}{x_{22}} \\ \textcolor{purple}{x_{31}} & \textcolor{purple}{x_{32}} \end{pmatrix} \cdot \begin{pmatrix} \textcolor{blue}{z_{11}} & \textcolor{green}{z_{12}} & \textcolor{teal}{z_{13}} \\ \textcolor{blue}{z_{21}} & \textcolor{green}{z_{22}} & \textcolor{teal}{z_{23}} \end{pmatrix} = \begin{pmatrix} \textcolor{red}{x_{11}}\textcolor{blue}{z_{11}} + \textcolor{red}{x_{12}}\textcolor{blue}{z_{21}} & \textcolor{red}{x_{11}}\textcolor{green}{z_{12}} + \textcolor{red}{x_{12}}\textcolor{green}{z_{22}} & \textcolor{red}{x_{11}}\textcolor{teal}{z_{13}} + \textcolor{red}{x_{12}}\textcolor{teal}{z_{23}} \\ \textcolor{orange}{x_{21}}\textcolor{blue}{z_{11}} + \textcolor{orange}{x_{22}}\textcolor{blue}{z_{21}} & \textcolor{orange}{x_{21}}\textcolor{green}{z_{12}} + \textcolor{orange}{x_{22}}\textcolor{green}{z_{22}} & \textcolor{orange}{x_{21}}\textcolor{teal}{z_{13}} + \textcolor{orange}{x_{22}}\textcolor{teal}{z_{23}} \\ \textcolor{purple}{x_{31}}\textcolor{blue}{z_{11}} + \textcolor{purple}{x_{32}}\textcolor{blue}{z_{21}} & \textcolor{purple}{x_{31}}\textcolor{green}{z_{12}} + \textcolor{purple}{x_{32}}\textcolor{green}{z_{22}} & \textcolor{purple}{x_{31}}\textcolor{teal}{z_{13}} + \textcolor{purple}{x_{32}}\textcolor{teal}{z_{23}} \end{pmatrix} \]
The following table helps visualize the process:
\[ \small \begin{array}{c|ccc} & \textcolor{blue}{z_{11}, z_{21}} & \textcolor{green}{z_{12}, z_{22}} & \textcolor{teal}{z_{13}, z_{23}} \\ \hline \textcolor{red}{x_{11}, x_{12}} & \textcolor{red}{x_{11}}\textcolor{blue}{z_{11}} + \textcolor{red}{x_{12}}\textcolor{blue}{z_{21}} & \textcolor{red}{x_{11}}\textcolor{green}{z_{12}} + \textcolor{red}{x_{12}}\textcolor{green}{z_{22}} & \textcolor{red}{x_{11}}\textcolor{teal}{z_{13}} + \textcolor{red}{x_{12}}\textcolor{teal}{z_{23}} \\ \textcolor{orange}{x_{21}, x_{22}} & \textcolor{orange}{x_{21}}\textcolor{blue}{z_{11}} + \textcolor{orange}{x_{22}}\textcolor{blue}{z_{21}} & \textcolor{orange}{x_{21}}\textcolor{green}{z_{12}} + \textcolor{orange}{x_{22}}\textcolor{green}{z_{22}} & \textcolor{orange}{x_{21}}\textcolor{teal}{z_{13}} + \textcolor{orange}{x_{22}}\textcolor{teal}{z_{23}} \\ \textcolor{purple}{x_{31}, x_{32}} & \textcolor{purple}{x_{31}}\textcolor{blue}{z_{11}} + \textcolor{purple}{x_{32}}\textcolor{blue}{z_{21}} & \textcolor{purple}{x_{31}}\textcolor{green}{z_{12}} + \textcolor{purple}{x_{32}}\textcolor{green}{z_{22}} & \textcolor{purple}{x_{31}}\textcolor{teal}{z_{13}} + \textcolor{purple}{x_{32}}\textcolor{teal}{z_{23}} \end{array} \]
It is easy to see that matrices can only be multiplied if the number of columns in the left matrix equals the number of rows in the right matrix.
A square matrix \(\boldsymbol{X}\) is called invertible if there exists a matrix \(\boldsymbol{X}^{-1}\) such that:
\[ \boldsymbol{XX}^{-1}=\boldsymbol{X}^{-1}\boldsymbol{X}=\boldsymbol{I}. \]
In this case, \(\boldsymbol{X}^{-1}\) is called the inverse of \(\boldsymbol{X}\). If no such matrix exists, \(\boldsymbol{X}\) is called singular or non-invertible. If an inverse exists, it is unique.
Suppose we observe a random event, such as a coin toss or the roll of a die. A random variable is a variable that takes on a value depending on the observed event. We denote it with a capital letter:
\[ X \]
We denote all possible outcomes with the corresponding lowercase letter:
\[ x_i \]
A discrete random variable is a random variable that can have only a finite or countably infinite number of possible outcomes. If the variable is called \(X\), we denote the outcomes as \(x_i\) and the corresponding probabilities as \(p_i\). Note that the sum of all probabilities \(\sum_i p_i\) must equal 1.
An example of a discrete random variable would be the roll of two dice. The possible outcomes are \(\{2,3,4,5,6,7,8,9,10,11,12\}\), and the corresponding probabilities are \(\{\tfrac{1}{36},\tfrac{2}{36},\tfrac{3}{36},\tfrac{4}{36},\tfrac{5}{36},\tfrac{6}{36},\tfrac{5}{36},\tfrac{4}{36},\tfrac{3}{36},\tfrac{2}{36},\tfrac{1}{36}\}\). To the right, the probability mass function (PMF) is illustrated:
A Bernoulli variable is a discrete random variable that can only take on two outcomes, such as a coin toss.
A continuous random variable is a random variable that can take on an uncountably infinite number of different outcomes.
We know that there are infinitely many outcomes and that the sum of all of these equals 1. It follows that the probability of any single outcome is zero. Therefore, there is no probability mass function as with discrete random variables.
What we can do, however, is draw a probability density function (PDF). It tells us the probability that the outcome falls within a certain interval. The total area under the PDF is equal to 1.
An example of such a variable would be the height of a person. It would make no sense to ask for the probability that a person is exactly 1.734681092536 meters tall. This probability is zero. But we can look at the PDF and determine how likely it is that the person’s height lies between 1.73 and 1.74 meters:
Whether an infinite set of numbers is countable or uncountable can be intuitively answered. All natural numbers \(\mathbb{N}\) are countably infinite. We can clearly specify a way to count them (start at 0, then 1, then 2, then 3, …), we just don’t know where and when the path ends. After all, it is still infinitely long. All real numbers \(\mathbb{R}\), however, are uncountably infinite. We cannot define a unique way to include all numbers. Suppose we start at 0, then 0.001 what about all the numbers in between? And all the numbers between those numbers? There is no way to count them all.
In addition to the probability density function, we can draw the cumulative distribution function (CDF). It represents the probability that the outcome is less than or equal to a certain value. The function is strictly monotonically increasing:
The dashed line shows how to read the plot: The value of the density function at \(X=1.74\) represents the probability that a randomly selected person is shorter than or exactly 1.74 meters tall.
Introduction to R with RStudio
Analysis of a Random VariableAnalysis of Two Random Variables
Let’s return to our example of rolling a die. The outcome is a discrete random variable with the following outcomes and associated probabilities:
Outcome | Probability |
---|---|
\(1\) | \(\tfrac{1}{6}\) |
\(2\) | \(\tfrac{1}{6}\) |
\(3\) | \(\tfrac{1}{6}\) |
\(4\) | \(\tfrac{1}{6}\) |
\(5\) | \(\tfrac{1}{6}\) |
\(6\) | \(\tfrac{1}{6}\) |
The expected value (or expectation) is a concept that allows us to simply analyze what value we can expect when rolling the die. We calculate it as the weighted arithmetic mean of the outcomes, with their respective probabilities as weights. We denote the expected value with a capital \(\mathrm{E}\):
\[ \mathrm{E} \equiv \sum_{i=1}^n x_i p_i \]
The expected value of a fair die is 3.5. If we draw more and more outcomes from this distribution, i.e., roll the die many times, the average of all rolls will approach the expected value more and more. As long as we work with discrete variables, all of this is relatively simple to interpret. For continuous variables, it becomes more challenging, but the general intuition remains the same.
In econometrics, we work a lot with expected values, so it is useful to know some rules for handling them.
Often, the expected value alone is not sufficient to analyze a distribution. Imagine you own a company that manufactures screws. You have two machines producing them. You advertise that your screws are all 35 millimeters long, but in reality, the length of the screws is randomly distributed: The expected value of the screw length is 35 mm for both machines. However, machine \(A\) mostly produces screws very close to the desired length, while machine \(B\) sometimes produces screws as short as 33 mm or as long as 37 mm. What is the difference between these two machines with identical expected values?
The answer is variance. Simply put: The expected value shows us the “center” of a distribution. Variance, on the other hand, indicates how far the outcomes tend to deviate from this expectation. We denote it as \(\mathrm{Var}(X)\) and calculate it as follows:
\[ \mathrm{Var}(X) \equiv \mathrm{E}\left((X - \mu)^2\right), \]
where \(\mu = \mathrm{E}(X)\).
It is evident that the variance of any constant is zero. Additionally, the following rule applies for a random variable \(X\) and constants \(a, b\):
\[ \mathrm{Var}(aX + b) = a^2 \mathrm{Var}(X) + \mathrm{Var}(b) = a^2 \mathrm{Var}(X) \]
The standard deviation, denoted as \(\mathrm{sd}(X)\), is simply the square root of the variance.
Suppose \(X\) and \(Y\) are two discrete random variables. In addition to their individual distributions, we can describe their joint distribution using a joint probability function:
\[ f_{X,Y}(x,y) = P(X=x, Y=y) \]
This function simply specifies the probability for each combination of \(X\) and \(Y\). If \(X\) and \(Y\) are independent, then:
\[ f_{X,Y}(x,y) = f_X(x)f_Y(y), \]
where \(f(x)\) and \(f(y)\) are the probability functions for \(X\) and \(Y\), respectively. Two random variables are independent if the outcome of \(X\) does not affect the probabilities of the possible outcomes of \(Y\).
Another important concept is the conditional distribution. The conditional probability density function describes how the outcome of \(X\) affects that of \(Y\):
\[ f_{Y|X}(y|x) = P(Y=y|X=x) = \frac{f_{X,Y}(x,y)}{f_{X}(x)}, \text{ for all } f_{X}(x) > 0 \]
If \(X\) and \(Y\) are independent, the outcome of \(X\) does not affect \(Y\), and thus \(f_{Y|X}(y|x) = f_{Y}(y)\).
Covariance is similar to a “two-variable version” of variance. It allows us to analyze two distributions together. It is defined as follows and denoted as \(\mathrm{Cov}(X,Y)\):
\[ \mathrm{Cov}(X,Y) \equiv \mathrm{E}\left((X-\mu_X)(Y-\mu_Y)\right), \]
where \(\mu_X = \mathrm{E}(X)\) and \(\mu_Y = \mathrm{E}(Y)\).
The sign of the covariance can be intuitively interpreted. If the covariance is positive, we expect \(Y\) to be above its mean when \(X\) is as well. If the covariance is negative, we expect \(Y\) to be below its mean when \(X\) is above its mean. Simply put, a positive covariance indicates that two variables are positively associated, and vice versa. A covariance of 0 means that there is no relationship. If \(X\) and \(Y\) are independent, the covariance is always 0.
An association in this sense does not necessarily imply causation, but more on that in the course :)
The following rules apply for covariance:
\[ \mathrm{Cov}(X,Y) = \mathrm{E}(XY) - \mathrm{E}(X)\mathrm{E}(Y) \]
For constants \(a, b, c, d\):
\[ \mathrm{Cov}(aX +b, cY +d) = a\cdot c \cdot \mathrm{Cov}(X,Y) \]
Suppose we have two random variables \(X\) and \(Y\) that are somehow connected. We want to know the expectation of \(Y\), given that \(X\) takes on a specific value. This is called the conditional expectation and is denoted as \(\mathrm{E}(Y|X=x)\). The following rules apply for conditional expectations: