Module 4: Testing and Inference

Econometrics I

Max Heinze (mheinze@wu.ac.at)

Department of Economics, WU Vienna

May 8, 2025

Introduction

Small Samples

t-Test

F-Test

Motivation

In Module 2 (and 3), we took a closer look at what it means for our OLS estimator to be a random variable. We determined the expected value and variance of the estimator and also conducted the following simulation:

Motivation

For everything we discuss in this chapter, we need more than just the two moments of expected value and variance. We must ask ourselves: What is the sampling distribution of the OLS estimator?

Why do we need information about this distribution? In Module 1 we said:

In order to test a hypothesis using data, we need data and a hypothesis.

Ideally, we have a theory from which we can derive a falsifiable hypothesis.
Then we can try to empirically test this hypothesis.

Hypothesis Tests

But how do we test a hypothesis? Suppose we want to know whether the parameter \(\beta_1\) is not equal to zero, i.e., whether the corresponding variable \(x_1\) has an effect on \(y\).

First idea: We estimate our model using OLS and check whether the absolute value of the estimate \(|\hat{\beta}_1|>0\).
- This idea is a bad idea.
- Intuition: We know that there is some uncertainty in our estimate. If our estimate is, for example, close to zero and/or the uncertainty is large — how “sure” can we be that our estimate is not just randomly different from zero?
Better idea: We assume that the true \(\beta_1\) equals zero and try to find out what the probability is that we still obtain the estimate we actually got.
- If this probability is small, we can say that it is unlikely to obtain such an estimate, if the true parameter \(\beta_1=0\).

Hypothesis Tests

What we discussed on the previous slide is called a hypothesis test. A bit more formally:

We formulate a so-called null hypothesis: \[ H_0:\beta_1=0. \] From this, we also derive an alternative hypothesis: \[ H_A:\beta_1\neq 0. \]
We assume that the null hypothesis is true, and calculate the probability of obtaining the estimate \(\hat{\beta}_1\) in this case.
If this probability is sufficiently low, we reject the null hypothesis.

Null and Alternative Hypothesis

Why is our null hypothesis \(\beta_1=0\) and not \(\beta_1\neq 0\)?

On one hand, classical statistical tests only allow us to test whether \(\beta_0\) is a specific value, for example 0.
- We said we assume the null hypothesis is true, and then calculate the probability of obtaining a specific \(\hat{\beta}_1\) under this null hypothesis.
- If the null hypothesis is \(\beta_1=0\), this is meaningful and intuitive. If \(\beta_1=0\), then it’s more likely to obtain \(\hat{\beta}_1=1\) than \(\hat{\beta}_1=5\).
- If we had \(\beta_1\neq 0\) as the null hypothesis, such thinking would be outright impossible. The discussed probability would be completely different for \(\beta_1=12\), \(\beta_1=0.000000001\), and \(\beta_1=-10^6\).

Rejecting Doesn’t Mean Confirming the Opposite

Why is our null hypothesis \(\beta_1=0\) and not \(\beta_1\neq 0\)?

Furthermore, with statistical tests, we can never confirm a hypothesis, only reject it.
- We want to find out whether \(x_1\) has an effect on \(y\).
- If our null hypothesis is that it has no effect (\(\beta_1=0\)), then rejecting this hypothesis gives us an important clue that the variable may have an effect.
- But we can never confirm that a variable has an effect — we can only reject the hypothesis that it has no effect.
- This is because when rejecting a hypothesis, we settle for a sufficiently small probability, but this probability is never 0.

In any case, for this testing procedure we need information about the sampling distribution of \(\hat{\beta}_1\), so we will first deal with that before returning to hypothesis tests.

Introduction

Small Samples

t-Test

F-Test

Interpretation of Regression Tables

Moments vs. Distribution

Using the assumptions MLR.1 through MLR.5, we were able to make statements about the expected value and variance of the OLS estimator.

But this is not enough to make statements about the distribution.
One example: The four distributions on the left all have a mean of 0 and a variance of 1.
Even under the Gauss-Markov assumptions, the distribution of \(\hat{\beta}_1\) can take very different shapes.
We therefore need an additional assumption.

(MLR.6) Normality

The error term of the population is independent of the explanatory variables \(x_1, \dots, x_K\) and is normally distributed with mean 0 and variance \(\sigma^2\):

\[ u\sim\mathrm{N}\left(0,\sigma^2\right) \]

This assumption implies assumptions MLR.4 and MLR.5. We still refer to MLR.1 through MLR.6 to make clear that we assume MLR.6 “in addition.”
This assumption is an extremely strong assumption. More on that will follow.
We refer to MLR.1 through MLR.6 collectively as the Classical Linear Model assumptions (CLM assumptions).
Under the CLM assumptions, OLS is not only BLUE, but also BUE (not limited to linear estimators).

(MLR.6) Normality

We can summarize the CLM assumptions about the population as follows:

\[ y\mid\boldsymbol{x}\sim\mathrm{N}\left(\boldsymbol{x}'\boldsymbol{\beta},\sigma^2\right). \]

The graph on the left illustrates this fact for the bivariate case (so the subscripts are \(i\), not \(k\)).
Under the CLM assumptions, the \(y\) for an observation \(i\) are normally distributed with
- mean \(\boldsymbol{x}'\boldsymbol{\beta}\) (in the bivariate case on the left, \(\beta_1x\)), and
- constant variance \(\sigma^2\).

Does MLR.6 Make Sense?

As mentioned earlier, \(u\sim\mathrm{N}\left(0,\sigma^2\right)\) is a very strong assumption. Can we justify this assumption?
One argument: The error term \(u\) is a sum of many unobserved factors that affect \(y\). Therefore, the central limit theorem (next slide) can be applied, and \(u\) is approximately normally distributed.
- However, the various factors in \(u\) may have very different distributions, which worsens the approximation.
- Also, nothing guarantees that the individual factors appear additively in the error term. This is an even bigger issue.
Later, we will discuss why non-normality of the errors is not a big problem in larger samples. For now, we will simply assume normality.
- Sometimes, in smaller samples, transformations (e.g., taking logarithms) are used to make the \(y\) values resemble a normal distribution more closely.

Central Limit Theorem

The Central Limit Theorem (CLT) states:

Let \(\{X_1, X_2, \dots, X_N\}\) be a sequence of independently and identically distributed random variables with mean \(\mu\) and variance \(\sigma^2\). Then the distribution function of the standardized random variable

\[ Z_N=\frac{\bar{X}_N-\mu}{\sigma/\sqrt{N}}, \]

where \(\bar{X}_N=\frac{1}{N}\sum^N_{i=1}X_i\), converges in distribution to the distribution function of the standard normal distribution.

\(Z_n\) is a standardized version of the sample mean.
Intuitively: As \(N\) increases, the distribution of the sample mean of the \(X_i\) converges to a normal distribution.

Distribution of the OLS Estimator, Part 1

Under the CLM assumptions MLR.1 through MLR.6, the OLS estimator, given the sample values of the independent variables, is normally distributed:

\[ \hat{\beta}_k\sim\mathrm{N}(\beta_k,\mathrm{Var}(\hat{\beta}_k)), \]

where \(\mathrm{Var}(\hat{\beta}_k)=\frac{\sigma^2}{\sum^N_{i=1}(x_{ik}-\bar{x}_k)^2}\times\frac{1}{1-R^2_k},\) and \(R^2_k\) is the \(R^2\) from a regression of \(x_{k}\) on all other regressors \(x_j,j\neq k\).

\(\hat{\beta}_k\) is normally distributed, any linear combination of the \(\hat{\beta}_k\) is also normally distributed, and the joint distribution of a subset of the \(\hat{\beta}_j\) is a multivariate normal distribution.
The standardized coefficient \((\hat{\beta}_k-\beta_k)/\mathrm{sd}(\hat{\beta}_k)\) follows a standard normal distribution.

Introduction

Small Samples

t-Test

F-Test

Interpretation of Regression Tables

Large Samples

Hypotheses About the OLS Estimator

In this section, we deal with testing hypotheses about one parameter of the population model:

\[ y = \beta_0 + \beta_1 x_1 + \dots + \beta_K x_K + u \]

As before: We don’t know the \(\beta_k\). We can only estimate them.
But we can formulate hypotheses about the \(\beta_k\).
Next, we can use statistical inference to test these hypotheses.

Distribution of the OLS Estimator, Part 2

Under the CLM assumptions MLR.1 through MLR.6, the following holds:

\[ (\hat{\beta}_k-\beta_k)/\mathrm{se}(\hat{\beta}_k) \sim \mathrm{t}_{N-K-1} \]

If we replace \(\mathrm{sd}(\cdot)\) in the standardized coefficient with \(\mathrm{se}(\cdot)\) (i.e., replace \(\sigma\) with \(\hat{\sigma}\)), it no longer follows a standard normal distribution, but a t-distribution with \(N-K-1\) degrees of freedom.

The t-distribution looks very similar to a standard normal distribution, but has fatter tails. The more degrees of freedom the distribution has, the closer it can be approximated by a normal distribution.

Null Hypothesis and t-Statistic

We specify the following null hypothesis:

\[ H_0:\beta_k=0 \]

After accounting for all \(x_j,j\neq k\), \(x_k\) has no effect on \(y\).

We can test this null hypothesis with the following test statistic:

\[ t_{\hat{\beta}_k}=\frac{\hat{\beta}_k-\beta_k}{\mathrm{se}(\hat{\beta}_k)}. \]

This particular test statistic is called the t-statistic.

Under the null hypothesis, \(\beta_k=0\) and the t-statistic is

\[ t_{\hat{\beta}_k}=\frac{\hat{\beta}_k}{\mathrm{se}(\hat{\beta}_k)}. \]

This t-statistic is t-distributed with mean 0 and \(N-K-1\) degrees of freedom.

Two-Sided Hypothesis Tests

We are testing the null hypothesis, \(\beta_k=0\), against a two-sided alternative, \(\beta_k\neq 0.\)
In the plot on the left, a \(t\)-distribution with 25 degrees of freedom is shown.
If the null hypothesis is true, then the t-statistics of the estimators we get should be distributed as shown on the left.
The idea is: If our actual t-statistic is so “extreme” (i.e., so large or so small) that it falls into the blue rejection regions of this distribution, then we consider it unlikely that the null hypothesis is true and reject it.

When Do We Reject the Null Hypothesis?

In the graph on the left, the rejection region, where we reject the null hypothesis, has a total area of 0.05. We call 0.05 the significance level.
For a t-distribution with 25 degrees of freedom, this yields critical values of -2.06 and 2.06.
So if the absolute value of the t-statistic is greater than 2.06, we reject the null hypothesis.
The threshold of 2.06 depends on:
- the number of degrees of freedom, and
- the significance level.

What Is a Significance Level?

We chose a significance level of 0.05 (or 5%). This means that
- if the null hypothesis is true, we will falsely reject it in 5% of cases;
- because under the null hypothesis, there is a 5% chance that the t-statistic is greater than 2.06 in absolute value; and in these cases we always reject the null.
- This is called a type 1 error, or false positive. We set the probability for this error ourselves via the significance level.
- The probability of a type 2 error, a false negative (we do not reject the null although it is false), is harder to determine.
0.05 is the most commonly used significance level; other frequently used levels are 0.10, 0.025, 0.01, 0.001, …
- The critical values for a given level and number of degrees of freedom can be obtained from a table or statistical software.

Testing More Specific Hypotheses

We can also perform a one-sided t-test, e.g. with \[ H_0:\beta_k\geq 0, \qquad H_A:\beta_k<0. \]
In a one-sided test, the entire rejection region is on one side, and the critical value for the same significance level and degrees of freedom is different.
We can also perform both one- and two-sided tests with other null hypotheses, e.g. \(H_0:\beta_k=1\). The distribution of the t-statistic does not change.

p-Values

Suppose we conduct a test as before \((\alpha=0.05,\mathrm{df}=25)\) and obtain a t-statistic of \(t=2.5\).
In the plot on the left, the region with “more extreme” t-statistics than 2.5 (i.e., \(|t|>2.5\)) is marked in pink.
The probability of obtaining a “more extreme” t-statistic than 2.5 is 0.019. This is the total area of both pink regions.
We call this value the p-value. The p-value makes interpretation easier: we don’t need to know a critical value, just check whether the p-value is smaller than the significance level. If it is, we reject the null hypothesis.

p-Values and Significance: Interpretation

If the p-value is smaller than the chosen significance level, we reject the null hypothesis.
Suppose the p-value for \(\beta_2\) is 0.03 and the significance level is 0.05. Then we can say:
- \(x_2\) is statistically significant at a 0.05 (or 5%) significance level.
- \(\beta_2\) is statistically significantly different from zero at a 0.05 significance level.
- We reject the null hypothesis at a 0.05 significance level.
- At a significance level of 3%, the test would be indifferent between rejection and non-rejection.
The following statements are false and we cannot say them:
- We accept the alternative hypothesis.
- The probability that the null hypothesis is true is 3%.
- […] at a 0.95 significance level.
- We are 97% confident that \(x_2\) has an effect.

Statistical vs. “Economic” Significance

So far, we have focused on whether a variable is statistically significant.
- Statistical significance depends solely on the t-statistic associated with a coefficient.
Another important concept for interpretation is economic or practical significance.
- The idea: Not every statistically significant variable is also an important factor affecting \(y\).
- We begin by checking statistical significance.
- If a variable is statistically significant, we can next check the magnitude of the coefficient.
- If the coefficient is very close to zero, the variable has little effect on \(y\), even if it is statistically significant.
- A variable that is statistically significant and has a meaningfully large effect can be interpreted as “statistically and economically significant.”
So for interpretation, it is always important to also consider the size of the coefficient.

Confidence Intervals

Under the CLM assumptions, we can also calculate a confidence interval for a population parameter \(\beta_k\). We’ll discuss this using a 95% confidence interval as an example.

We can interpret a 95% confidence interval as follows: If we repeatedly draw samples and compute the confidence interval, it will contain the true parameter in 95% of cases.
We cannot say that the parameter (of the population) falls within the interval in 95% of cases, since the confidence interval changes, and not the parameter.
The 95% confidence interval for a parameter \(\beta_k\) is: \[ \left[\hat{\beta}_k-c\times\mathrm{se}(\hat{\beta}_k),\quad \hat{\beta}_k+c\times\mathrm{se}(\hat{\beta}_k)\right], \] where \(c\) is the 97.5th percentile of a \(t_{N-K-1}\) distribution.

Introduction

Small Samples

t-Test

F-Test

Interpretation of Regression Tables

Large Samples

How Many Restrictions Do We Want to Test?

With the t-test, we were able to impose a single restriction on our model, e.g.

\[ \beta_1=0, \]

and test this restriction.

But what if we want to test multiple restrictions jointly? For example, we may be interested in whether a particular group of independent variables collectively has no effect on \(y\):

\[ \beta_1=0,\beta_2=0,\beta_3=0. \]

To test such restrictions, we need a different test: the F-test.

Multiple Restrictions: Hypotheses

\[ \beta_1=0,\beta_2=0,\beta_3=0. \]

The null and alternative hypotheses in this case are:

\[ H_0:\beta_1=0,\beta_2=0,\beta_3=0;\qquad H_A:H_0\text{ is not true}. \]

In this case, we are testing three exclusion restrictions, i.e., we are testing multiple hypotheses simultaneously.
Since we are testing the hypotheses simultaneously, we can’t rely on the separate t-statistics for each parameter.
Therefore, we need a different test statistic, whose distribution we know, to carry out such a test.

Unrestricted and Restricted Model

We start by writing down our full (unrestricted) model:

\[ y = \beta_0+\beta_1x_1+\beta_2x_2+\beta_3x_3+\beta_4x_4+\beta_5x_5+u. \]

Then we apply all restrictions and obtain the restricted model:

\[ y = \beta_0 + \beta_4x_4+\beta_5x_5+u. \]

How can we compare these models?

One approach: We consider the sum of squared residuals (SSR) of both models.
Since SSR always increases when we remove variables from a model, we seek a test statistic that evaluates how large the relative increase in SSR is when we apply our restrictions.

F-Statistic

Such a test statistic is

\[ F = \frac{(\mathrm{SSR}_r-\mathrm{SSR}_{ur})/q}{\mathrm{SSR}_{ur}/(N-K-1)}, \]

where \(q\) is the number of restrictions imposed.

Under the CLM assumptions, this test statistic, the F-statistic, follows an F-distribution with \(q\) degrees of freedom in the numerator and \((N-K-1)\) degrees of freedom in the denominator.
The test statistic is always greater than zero.
An alternative form of the F-statistic using \(R^2\) instead of SSR is \[ F=\frac{(R^2_{ur}-R^2_r)/q}{(1-R^2_{ur})/(N-K-1)}. \]

F-Distribution

The graph shows an F-distribution with 3 and 50 degrees of freedom. The critical value is 2.798; we reject the null hypothesis if we obtain an F-statistic greater than this value (one-sided test).
If we cannot reject the null hypothesis, we say the variables are jointly insignificant.
If we reject the null hypothesis, we say the variables are jointly significant.
An F-test for only one restriction gives the same result as the corresponding t-test. However, several individually insignificant variables can be jointly significant.

F-Statistic for Overall Significance

When we run a regression, the statistics software typically tests a particular set of restrictions:

\[ H_0:\beta_1=0,\beta_2=0,\dots,\beta_K=0, \]

i.e., that all independent variables jointly do not contribute to explaining \(y\).

The F-statistic for this case can be written as

\[ F=\frac{R^2/K}{(1-R^2)/(N-K-1)}. \]

For both this “global” F-statistic and all other F-statistics, statistics software provides a p-value, which makes interpretation easier—just like with t-statistics.

Small Samples

t-Test

F-Test

Interpretation of Regression Tables

Large Samples

Huh, a Baseball Dataset ⚾

We use the baseball dataset from the Wooldridge textbook to take a look at everything using a practical example.

We’re Eagerly Estimating Regressions

Significant or Not Significant, That Is the Question

None of the three variables bavg, hrunsyr, and rbisyr were significant on their own.
But when we test whether the three variables are jointly significant, we can reject the null hypothesis.
The p-value of the F-statistic for the entire regression model was very small in both models.

t-Test

F-Test

Interpretation of Regression Tables

Large Samples

What Is a Large Sample?

All properties of the OLS estimator we’ve discussed so far apply to all finite samples, no matter how large or small \(N\) is.
- This includes the unbiasedness of the OLS estimator, the Gauss-Markov theorem, etc.
- It also includes everything we discussed about the sampling distribution of OLS estimators, t- and F-tests … as long as we assume MLR.6.
In addition to these, OLS has certain large sample properties.
- These refer to properties that arise when \(N\) approaches infinity, so they do not necessarily hold for a particular sample size \(N\) (or even for all possible \(N\)).
- Some properties hold in large samples even when certain assumptions are not satisfied.

What Happens Without Assumption MLR.6?

Without assuming MLR.6, the t-statistic does not necessarily follow a t-distribution, and the F-statistic does not necessarily follow an F-distribution. We cannot test hypotheses about parameters as we did before.
However, we also discussed that MLR.6 is unrealistic. MLR.6 implies that \(y\mid\boldsymbol{x}\) is normally distributed, which clearly doesn’t make sense for certain \(y\) variables.
Conveniently, using the Central Limit Theorem, the following can be shown for large samples:

Under assumptions MLR.1 through MLR.5, the t-statistic is asymptotically normally distributed:

\[ \frac{\hat{\beta}_k-\beta_k}{\mathrm{se}(\hat{\beta}_k)}\:\overset{\mathrm{d}}{\rightarrow}\mathrm{N}(0,1)\qquad\text{or}\qquad \frac{\hat{\beta}_k-\beta_k}{\mathrm{se}(\hat{\beta}_k)}\:\overset{\mathrm{d}}{\rightarrow}\mathrm{t}_{N-K-1}. \]

What Happens Without Assumption MLR.6?

Under assumptions MLR.1 through MLR.5, the t-statistic is asymptotically normally distributed:

Since the t-distribution converges in distribution to a standard normal distribution as the degrees of freedom increase, we can use either the left or right expression.
This means we can use the t-statistic just like with MLR.6, as long as our sample size is large enough.
The asymptotic normality of OLS estimators also implies that the F-statistic is asymptotically F-distributed in large samples.

LM Statistic

The Lagrange Multiplier Test (LM test) is an alternative to the F-test in large samples.

The LM statistic is asymptotically \(\chi^2_q\)-distributed under assumptions MLR.1 through MLR.5 (its small-sample distribution is unknown).
We obtain the LM statistic as follows:
1. Estimate only the restricted model.
2. Take the residuals from this regression and regress them on all \(K\) independent variables from the full model.
3. Compute the LM statistic as \(LM = N R^2\), where \(R^2\) is from the regression in step (2).
The idea: Can the additional explanatory variables explain the residuals from the restricted model?
LM test and F test rarely lead to different results.

References

Wooldridge, J. M. (2020). Introductory econometrics : A modern approach (Seventh edition, pp. xxii, 826 Seiten). Cengage. https://permalink.obvsg.at/wuw/AC15200792