Module 6: Heteroskedasticity

Econometrics I

Max Heinze (mheinze@wu.ac.at)

Department of Economics, WU Vienna

Based on a Slide Set by Simon Heß

June 5, 2025

What is Heteroskedasticity?

Robust Standard Errors

Tests for Heteroskedasticity

Weighted Least Squares

Homoskedasticity and Heteroskedasticity

Let’s recall assumption MLR.5 about homoskedasticity:

\[ \mathrm{Var}(u_i\mid x_{i1},\dots,x_{iK}) = \mathrm{Var}(u_i) = \sigma^2, \]

We’ve already discussed that this assumption is frequently violated.
- People with more education likely have higher income variance.
- People with higher income likely have higher variance in how much CO₂ emissions they cause.
We call the case in which MLR.5 is violated heteroskedasticity.
- Heteroskedasticity occurs when certain individuals or groups of individuals have more or less unexplained variation than the rest.

Heteroskedasticity

If our error term is heteroskedastic, then the variance depends on \(i\):

\[ \mathrm{Var}(u_i\mid x_{i1},\dots,x_{iK}) = \mathrm{E}(u_i^2\mid x_{i1},\dots,x_{iK})= \sigma^2_{\textcolor{var(--secondary-color)}{i}}\qquad\neq\sigma^2, \]

\[ \mathrm{Var}(\boldsymbol{u}\mid\boldsymbol{X}) = \mathrm{E}(\boldsymbol{uu}'\mid\boldsymbol{X})=\mathrm{diag}(\sigma^2_{\textcolor{var(--secondary-color)}{1}}, \dots, \sigma^2_{\textcolor{var(--secondary-color)}{N}})\qquad\neq\sigma^2\boldsymbol{I}. \]

The OLS estimator is still unbiased and consistent in such a case, since these properties only require MLR.1 to MLR.4.
However, the formula we used to calculate \(\mathrm{Var}(\hat{\boldsymbol{\beta}})\) and \(\mathrm{s.e.}(\hat{\boldsymbol{\beta}})\) is no longer valid, and OLS is also no longer efficient.

Illustration

What to do?

Let’s summarize: We often deal with heteroskedastic errors. That means the OLS estimator is no longer efficient, and we can no longer compute the variance of \(\hat{\boldsymbol{\beta}}\) as before.

That causes a number of problems:
- Our standard errors are no longer accurate.
- So our t-statistics, F-statistics, etc. are misleading.
- Inefficiency means that there must now be a better estimator.
What can we do about that? Nothing. But we can learn how to deal with it.
- In sufficiently large samples, the efficiency problem becomes smaller.
- We can keep the inefficient OLS estimator, but replace the standard errors.
- We can test for the presence of heteroskedasticity.
- We can use a different, efficient estimator.

What is Heteroskedasticity?

Robust Standard Errors

Tests for Heteroskedasticity

Weighted Least Squares

Appendix

Variance of the OLS Estimator with Heteroskedasticity

Originally, we assumed that \(\mathrm{Var}(\boldsymbol{u}\mid\boldsymbol{X}) = \sigma^2\boldsymbol{I}_N.\) Under this assumption, the variance of the OLS estimator was:

\[ \mathrm{Var}\left(\hat{\boldsymbol{\beta}}\right)=\sigma^2(\boldsymbol{X}'\boldsymbol{X})^{-1}. \]

We now make a less restrictive assumption:

\[ \mathrm{Var}(\boldsymbol{u}\mid\boldsymbol{X}) = \mathrm{E}(\boldsymbol{uu}'\mid\boldsymbol{X}) = \mathrm{diag}(\sigma_1^2,\dots,\sigma_N^2) =:\boldsymbol{\Omega} \]

Under this assumption, the variance of the OLS estimator is:

\[ \mathrm{Var}(\hat{\boldsymbol{\beta}}\mid\boldsymbol{X})=(\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{\Omega X}(\boldsymbol{X}'\boldsymbol{X})^{-1}. \]

Proof

Practice Question

What happens to this formula if \(\boldsymbol{\Omega}=\sigma^2\boldsymbol{I}\)?

We Need an Estimator Again

We have a problem with this equation:

\[ \mathrm{Var}(\hat{\boldsymbol{\beta}}\mid\boldsymbol{X})=(\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{\Omega X}(\boldsymbol{X}'\boldsymbol{X})^{-1}. \]

We don’t know \(\boldsymbol{\Omega}\). However, \(\mathrm{diag}(\hat{u}_1^2,\dots,\hat{u}_N^2)\) is a consistent estimator for \(\boldsymbol{\Omega}\).

With this estimator, we can construct the following estimator for the variance of \(\hat{\boldsymbol{\beta}}\):

\[ \widehat{\mathrm{Var}}(\hat{\boldsymbol{\beta}}\mid\boldsymbol{X})=\textcolor{var(--tertiary-color)}{(\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'}\textcolor{var(--quarternary-color)}{\mathrm{diag}(\hat{u}_1^2,\dots,\hat{u}_N^2)}\textcolor{var(--tertiary-color)}{\boldsymbol{X}(\boldsymbol{X}'\boldsymbol{X})^{-1}}. \]

This estimator is sometimes called the sandwich estimator (see the image for an explanation).

Robust Standard Errors

Standard errors computed with this estimator are called heteroskedasticity-robust standard errors.
The resulting t-statistics and F-statistics are also referred to as robust.
Robust standard errors are valid under both heteroskedasticity and homoskedasticity.
Non-robust standard errors are only valid under homoskedasticity.
t-statistics computed with robust standard errors are only approximately t-distributed in large samples. In small samples, the distribution can differ substantially.
t-statistics computed with non-robust standard errors are exactly t-distributed even in small samples, but only if the errors are homoskedastic.

Let’s Estimate a School Model Again 🧑‍🏫

Usual Output (and Usual Standard Errors 🚨)

Robust Standard Errors 🥳

What is Heteroskedasticity?

Robust Standard Errors

Tests for Heteroskedasticity

Weighted Least Squares

Appendix

Breusch-Pagan Test

We can test whether specific forms of heteroskedasticity are present (although we typically don’t know the exact form of the heteroskedasticity).

The first approach we discuss is the Breusch-Pagan test from Breusch & Pagan (1979). With this LM test, we check whether \(\sigma^2_i\) depends linearly on the regressors:

\[ \sigma^2_i = \delta_0 + \delta_1x_{i1} + \dots + \delta_Kx_{iK} + \text{error}. \]

The null hypothesis of the test is:

\[ H_0:\delta_1=\dots=\delta_K=0 \]

In large samples, the LM statistic of this test is \(\chi^2\)-distributed under the null hypothesis with \(K\) degrees of freedom.

Breusch-Pagan Test

We conduct the Breusch-Pagan test as follows:

Estimate the main regression \(\boldsymbol{y}=\boldsymbol{X\beta}+\boldsymbol{u}\) using OLS and retain the residuals \(\hat{u}_i.\)
Next, estimate the following auxiliary regression: \[ \hat{u}_i^2=\delta_0+\delta_1x_{i1}+\dots+x_{iK}+\text{error}, \] and retain the \(R^2\) of this regression.
The statistic \(NR^2\) is the approximate LM statistic and is \(\chi^2_K\)-distributed in large samples.

White Test

The White test from White (1980) is a variant of the Breusch-Pagan test with a more flexible specification: it also includes all possible squared terms and interactions of the regressors. We conduct it as follows:

Estimate the main regression \(\boldsymbol{y}=\boldsymbol{X\beta}+\boldsymbol{u}\) using OLS and retain the residuals \(\hat{u}_i.\)
Next, estimate the following auxiliary regression: \[ \begin{aligned} \hat{u}_i^2=\delta_0+&\delta_1x_{i1}+\dots+x_{iK}+\\ &\delta_{K+1}x_{i1}^2+\dots+\delta_{2K}x_{iK}^2+\\ &\delta_{2K+1}x_{i1}x_{i2}+\dots+\delta_{(K(K+3)2)}x_{i,K-1}x_{iK}+\text{error}, \end{aligned} \] and retain the \(R^2\) of this regression.
The statistic \(NR^2\) is the approximate LM statistic and is \(\chi^2_K\)-distributed in large samples.

White Test

\[ \begin{aligned} \hat{u}_i^2=\delta_0+&\delta_1x_{i1}+\dots+x_{iK}+\\ &\delta_{K+1}x_{i1}^2+\dots+\delta_{2K}x_{iK}^2+\\ &\delta_{2K+1}x_{i1}x_{i2}+\dots+\delta_{(K(K+3)2)}x_{i,K-1}x_{iK}+\text{error}, \end{aligned} \]

This regression has \((K(K+3)2)\) regressors. That’s a lot of regressors. If \(K\) is large and \(N\) is small, it might even be too many.

An alternative version of the White test is:

\[ \hat{u}_i^2 = \delta_0 + \delta_1\hat{y}_i + \delta_2\hat{y}_i^2+\text{error}. \]

So we regress \(\hat{u}_i^2\) on the fitted values from step (1).
Since \(\hat{y}_i\) is a linear function of the explanatory variables, \(\hat{y}_i^2\) is a specific function of the squares and cross-products of the explanatory variables.

Tests for Heteroskedasticity in R

What is Heteroskedasticity?

Robust Standard Errors

Tests for Heteroskedasticity

Weighted Least Squares

Appendix

How Do We Find an Efficient Estimator?

Assume we have heteroskedasticity, but we know the \(\sigma^2_i\). We want to estimate the following regression:

\[ y_i = \beta_0+\beta_1x_{i1}+\dots+\beta_Kx_{iK}+u_i, \]

but we know that OLS is inefficient.

However, with the error variances \(\sigma^2_i\), we can construct an efficient estimator. To do this, we divide the regression by \(\sigma_i=\sqrt{\sigma^2_i}\):

\[ \frac{y_i}{\sigma_i}=\beta_0\frac{1}{\sigma_i}+\beta_1\frac{x_{i1}}{\sigma_i}+\dots+\beta_K\frac{x_{iK}}{\sigma_i}+\frac{u_i}{\sigma_i} \]

Why do we do this? Because this way we scale the variance so that it is equal for all \(i\).

If \(\mathrm{Var}(u_i)=\sigma^2_i\), then \(\mathrm{Var}(u_i/\sigma_i)=1\). Thus, MLR.5 is satisfied.

WLS Estimator

We weight observations with higher variance less than those with lower variance — hence the name weighted least squares. In matrix notation:

\[ \tilde{\boldsymbol{y}} = \tilde{\boldsymbol{X}}\boldsymbol{\beta}_{\mathrm{WLS}}+\tilde{\boldsymbol{u},} \]

where \(\tilde{\boldsymbol{y}}=\boldsymbol{\Omega}^{-1/2}\boldsymbol{y}\), \(\tilde{\boldsymbol{X}}=\boldsymbol{\Omega}^{-1/2}\boldsymbol{X}\), and \(\tilde{\boldsymbol{u}}=\boldsymbol{\Omega}^{-1/2}\boldsymbol{u}\); \(\boldsymbol{\Omega}=\mathrm{diag}(\sigma_1^2,\dots,\sigma_N^2)\).

The WLS estimator in this case is:

\[ \hat{\boldsymbol{\beta}}_{\mathrm{WLS}} = (\tilde{\boldsymbol{X}}'\tilde{\boldsymbol{X}})^{-1}\tilde{\boldsymbol{X}}'\tilde{\boldsymbol{y}}=(\boldsymbol{X}'\boldsymbol{\Omega}^{-1}\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{\Omega}^{-1}\boldsymbol{y}. \]

This WLS estimator is a special case of the generalized least squares estimator (GLS). GLS can be used with any variance-covariance matrix \(\boldsymbol{\Omega}\), not just the diagonal one above.

Variance of the WLS Estimator

The variance of the WLS estimator is:

\[ \mathrm{Var}(\hat{\boldsymbol{\beta}}_{\mathrm{WLS}}\mid\boldsymbol{X})=(\tilde{\boldsymbol{X}}'\tilde{\boldsymbol{X}})^{-1} = (\boldsymbol{X}'\boldsymbol{\Omega}^{-1}\boldsymbol{X})^{-1} \]

We can estimate this variance using \(\hat{\boldsymbol{\Omega}}\). This allows us to obtain standard errors for tests. The variance of the WLS estimator is lower than that of the OLS estimator (proof omitted):

\[ \mathrm{Var}(\hat{\boldsymbol{\beta}}_{\mathrm{OLS}}\mid\boldsymbol{X})=(\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{\Omega}\boldsymbol{X} (\boldsymbol{X}'\boldsymbol{X})^{-1} \]

Feasible Generalized Least Squares

The problem is: We cannot estimate this. GLS (WLS) requires us to know the \(\sigma_i^2\), but we don’t.

But we can estimate the \(\sigma_i^2\).
For example, we can assume that \[ \sigma_i^2=\sigma^2\mathrm{exp}(\delta_0+\delta_1x_i+\dots+\delta_Kx_K), \] using the exponential function to avoid negative values.
We take the log and plug in \(\hat{u}_i^2\) for \(\sigma_i^2\): \[ \mathrm{log}(\hat{u}_i^2)=\alpha_0 +\delta_1x_i+\dots+\delta_Kx_K+\mathrm{error},\qquad\qquad \alpha_0=\mathrm{log}(\sigma^2)+\delta_0 \]
We call the fitted values from this regression \(\hat{g}_i\) and use \(\hat{\sigma}_i=\sqrt{\mathrm{exp}(\hat{g}_i)}\) as weights. The resulting estimator is called feasible generalized least squares (fGLS).

How Do We Implement fGLS?

If we want to apply feasible generalized least squares, we can proceed as follows:

Regress \(y\) on \(x_1,\dots,x_K\) using OLS and retain the residuals \(\hat{u}\).
Compute \(\mathrm{log}(\hat{u}^2)\) using these residuals.
Regress \(\mathrm{log}(\hat{u}^2)\) on \(x_1,\dots,x_K\) using OLS and retain the fitted values \(\hat{g}\).
To obtain variance estimates, compute \(\hat{\sigma}_i^2=\mathrm{exp}(\hat{g}_i)\).
Finally, regress \(y\) on \(x_1,\dots,x_K\) using WLS, with weights \(1/\sqrt{\hat{\sigma}_i\smash{^2}}\).

One remaining problem: We don’t know the “true” functional form of the heteroskedasticity, we’ve only applied one possible form.

WLS is only guaranteed to be efficient if this form is correctly specified.
If this is not the case, fGLS is still more efficient than OLS in large samples.
Additionally, fGLS is consistent, although it is not unbiased.

References

Breusch, T. S., & Pagan, A. R. (1979). A simple test for heteroscedasticity and random coefficient variation. Econometrica, 47(5), 1287. https://doi.org/10.2307/1911963

White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48(4), 817. https://doi.org/10.2307/1912934

Wooldridge, J. M. (2020). Introductory econometrics : A modern approach (Seventh edition, pp. xxii, 826 Seiten). Cengage. https://permalink.obvsg.at/wuw/AC15200792

Robust Standard Errors

Tests for Heteroskedasticity

Weighted Least Squares

Appendix

Variance of the OLS Estimator in the General Case

\[ \begin{aligned} \mathrm{Var}(\hat{\boldsymbol{\beta}}\mid\boldsymbol{X}) &= \mathrm{Var}\left((\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{y}\mid\boldsymbol{X}\right) \\ &= \mathrm{Var}\left((\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'(\boldsymbol{X}\boldsymbol{\beta}+\boldsymbol{u})\mid\boldsymbol{X}\right) \\ &= \mathrm{Var}\left((\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{X}\boldsymbol{\beta}+(\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{u}\mid\boldsymbol{X}\right) \\ &= \mathrm{Var}\left(\boldsymbol{\beta}+(\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{u}\mid\boldsymbol{X}\right) \\ &= \mathrm{Var}\left((\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{u}\mid\boldsymbol{X}\right) \\ &= (\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\mathrm{Var}(\boldsymbol{u}\mid\boldsymbol{X})\boldsymbol{X}(\boldsymbol{X}'\boldsymbol{X})^{-1} \\ &= (\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{\Omega}\boldsymbol{X}(\boldsymbol{X}'\boldsymbol{X})^{-1} \end{aligned} \]

Back