Econometrics I
Department of Economics, WU Vienna
May 8, 2025
In Module 2 (and 3), we took a closer look at what it means for our OLS estimator to be a random variable. We determined the expected value and variance of the estimator and also conducted the following simulation:
For everything we discuss in this chapter, we need more than just the two moments of expected value and variance. We must ask ourselves: What is the sampling distribution of the OLS estimator?
Why do we need information about this distribution? In Module 1 we said:
In order to test a hypothesis using data, we need data and a hypothesis.
But how do we test a hypothesis? Suppose we want to know whether the parameter \(\beta_1\) is not equal to zero, i.e., whether the corresponding variable \(x_1\) has an effect on \(y\).
What we discussed on the previous slide is called a hypothesis test. A bit more formally:
Why is our null hypothesis \(\beta_1=0\) and not \(\beta_1\neq 0\)?
Why is our null hypothesis \(\beta_1=0\) and not \(\beta_1\neq 0\)?
In any case, for this testing procedure we need information about the sampling distribution of \(\hat{\beta}_1\), so we will first deal with that before returning to hypothesis tests.
Using the assumptions MLR.1 through MLR.5, we were able to make statements about the expected value and variance of the OLS estimator.
The error term of the population is independent of the explanatory variables \(x_1, \dots, x_K\) and is normally distributed with mean 0 and variance \(\sigma^2\):
\[ u\sim\mathrm{N}\left(0,\sigma^2\right) \]
We can summarize the CLM assumptions about the population as follows:
\[ y\mid\boldsymbol{x}\sim\mathrm{N}\left(\boldsymbol{x}'\boldsymbol{\beta},\sigma^2\right). \]
The Central Limit Theorem (CLT) states:
Let \(\{X_1, X_2, \dots, X_N\}\) be a sequence of independently and identically distributed random variables with mean \(\mu\) and variance \(\sigma^2\). Then the distribution function of the standardized random variable
\[ Z_N=\frac{\bar{X}_N-\mu}{\sigma/\sqrt{N}}, \]
where \(\bar{X}_N=\frac{1}{N}\sum^N_{i=1}X_i\), converges in distribution to the distribution function of the standard normal distribution.
Under the CLM assumptions MLR.1 through MLR.6, the OLS estimator, given the sample values of the independent variables, is normally distributed:
\[ \hat{\beta}_k\sim\mathrm{N}(\beta_k,\mathrm{Var}(\hat{\beta}_k)), \]
where \(\mathrm{Var}(\hat{\beta}_k)=\frac{\sigma^2}{\sum^N_{i=1}(x_{ik}-\bar{x}_k)^2}\times\frac{1}{1-R^2_k},\) and \(R^2_k\) is the \(R^2\) from a regression of \(x_{k}\) on all other regressors \(x_j,j\neq k\).
In this section, we deal with testing hypotheses about one parameter of the population model:
\[ y = \beta_0 + \beta_1 x_1 + \dots + \beta_K x_K + u \]
Under the CLM assumptions MLR.1 through MLR.6, the following holds:
\[ (\hat{\beta}_k-\beta_k)/\mathrm{se}(\hat{\beta}_k) \sim \mathrm{t}_{n-k-1} \]
We specify the following null hypothesis:
\[ H_0:\beta_k=0 \]
After accounting for all \(x_j,j\neq k\), \(x_k\) has no effect on \(y\).
We can test this null hypothesis with the following test statistic:
\[ t_{\hat{\beta}_k}=\frac{\hat{\beta}_k-\beta_k}{\mathrm{se}(\hat{\beta}_k)}. \]
This particular test statistic is called the t-statistic.
Under the null hypothesis, \(\beta_k=0\) and the t-statistic is
\[ t_{\hat{\beta}_k}=\frac{\hat{\beta}_k}{\mathrm{se}(\hat{\beta}_k)}. \]
This t-statistic is t-distributed with mean 0 and \(N-K-1\) degrees of freedom.
Under the CLM assumptions, we can also calculate a confidence interval for a population parameter \(\beta_k\). We’ll discuss this using a 95% confidence interval as an example.
With the t-test, we were able to impose a single restriction on our model, e.g.
\[ \beta_1=0, \]
and test this restriction.
But what if we want to test multiple restrictions jointly? For example, we may be interested in whether a particular group of independent variables collectively has no effect on \(y\):
\[ \beta_1=0,\beta_2=0,\beta_3=0. \]
To test such restrictions, we need a different test: the F-test.
\[ \beta_1=0,\beta_2=0,\beta_3=0. \]
The null and alternative hypotheses in this case are:
\[ H_0:\beta_1=0,\beta_2=0,\beta_3=0;\qquad H_A:H_0\text{ is not true}. \]
We start by writing down our full (unrestricted) model:
\[ y = \beta_0+\beta_1x_1+\beta_2x_2+\beta_3x_3+\beta_4x_4+\beta_5x_5+u. \]
Then we apply all restrictions and obtain the restricted model:
\[ y = \beta_0 + \beta_4x_4+\beta_5x_5+u. \]
How can we compare these models?
Such a test statistic is
\[ F = \frac{(\mathrm{SSR}_r-\mathrm{SSR}_{ur})/q}{\mathrm{SSR}_{ur}/(N-K-1)}, \]
where \(q\) is the number of restrictions imposed.
When we run a regression, the statistics software typically tests a particular set of restrictions:
\[ H_0:\beta_1=0,\beta_2=0,\dots,\beta_K=0, \]
i.e., that all independent variables jointly do not contribute to explaining \(y\).
The F-statistic for this case can be written as
\[ F=\frac{R^2/K}{(1-R^2)/(N-K-1)}. \]
For both this “global” F-statistic and all other F-statistics, statistics software provides a p-value, which makes interpretation easier—just like with t-statistics.
We use the baseball dataset from the Wooldridge textbook to take a look at everything using a practical example.
bavg
, hrunsyr
, and rbisyr
were significant on their own.Under assumptions MLR.1 through MLR.5, the t-statistic is asymptotically normally distributed:
\[ \frac{\hat{\beta}_k-\beta_k}{\mathrm{se}(\hat{\beta}_k)}\:\overset{\mathrm{d}}{\rightarrow}\mathrm{N}(0,1)\qquad\text{or}\qquad \frac{\hat{\beta}_k-\beta_k}{\mathrm{se}(\hat{\beta}_k)}\:\overset{\mathrm{d}}{\rightarrow}\mathrm{t}_{N-K-1}. \]
Under assumptions MLR.1 through MLR.5, the t-statistic is asymptotically normally distributed:
\[ \frac{\hat{\beta}_k-\beta_k}{\mathrm{se}(\hat{\beta}_k)}\:\overset{\mathrm{d}}{\rightarrow}\mathrm{N}(0,1)\qquad\text{or}\qquad \frac{\hat{\beta}_k-\beta_k}{\mathrm{se}(\hat{\beta}_k)}\:\overset{\mathrm{d}}{\rightarrow}\mathrm{t}_{N-K-1}. \]
The Lagrange Multiplier Test (LM test) is an alternative to the F-test in large samples.