Econometrics I
June 5, 2025
What is Heteroskedasticity?
Let’s recall assumption MLR.5 about homoskedasticity:
\[ \mathrm{Var}(u_i\mid x_{i1},\dots,x_{iK}) = \mathrm{Var}(u_i) = \sigma^2, \]
If our error term is heteroskedastic, then the variance depends on \(i\):
\[ \mathrm{Var}(u_i\mid x_{i1},\dots,x_{iK}) = \mathrm{E}(u_i^2\mid x_{i1},\dots,x_{iK})= \sigma^2_{\textcolor{var(--secondary-color)}{i}}\qquad\neq\sigma^2, \]
\[ \mathrm{Var}(\boldsymbol{u}\mid\boldsymbol{X}) = \mathrm{E}(\boldsymbol{uu}'\mid\boldsymbol{X})=\mathrm{diag}(\sigma^2_{\textcolor{var(--secondary-color)}{1}}, \dots, \sigma^2_{\textcolor{var(--secondary-color)}{N}})\qquad\neq\sigma^2\boldsymbol{I}. \]
Let’s summarize: We often deal with heteroskedastic errors. That means the OLS estimator is no longer efficient, and we can no longer compute the variance of \(\hat{\boldsymbol{\beta}}\) as before.
Robust Standard Errors
Originally, we assumed that \(\mathrm{Var}(\boldsymbol{u}\mid\boldsymbol{X}) = \sigma^2\boldsymbol{I}_N.\) Under this assumption, the variance of the OLS estimator was:
\[ \mathrm{Var}\left(\hat{\boldsymbol{\beta}}\right)=\sigma^2(\boldsymbol{X}'\boldsymbol{X})^{-1}. \]
We now make a less restrictive assumption:
\[ \mathrm{Var}(\boldsymbol{u}\mid\boldsymbol{X}) = \mathrm{E}(\boldsymbol{uu}'\mid\boldsymbol{X}) = \mathrm{diag}(\sigma_1^2,\dots,\sigma_N^2) =:\boldsymbol{\Omega} \]
Under this assumption, the variance of the OLS estimator is:
\[ \mathrm{Var}(\hat{\boldsymbol{\beta}}\mid\boldsymbol{X})=(\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{\Omega X}(\boldsymbol{X}'\boldsymbol{X})^{-1}. \]
Practice Question
What happens to this formula if \(\boldsymbol{\Omega}=\sigma^2\boldsymbol{I}\)?
We have a problem with this equation:
\[ \mathrm{Var}(\hat{\boldsymbol{\beta}}\mid\boldsymbol{X})=(\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{\Omega X}(\boldsymbol{X}'\boldsymbol{X})^{-1}. \]
We don’t know \(\boldsymbol{\Omega}\). However, \(\mathrm{diag}(\hat{u}_1^2,\dots,\hat{u}_N^2)\) is a consistent estimator for \(\boldsymbol{\Omega}\).
With this estimator, we can construct the following estimator for the variance of \(\hat{\boldsymbol{\beta}}\):
\[ \widehat{\mathrm{Var}}(\hat{\boldsymbol{\beta}}\mid\boldsymbol{X})=\textcolor{var(--tertiary-color)}{(\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'}\textcolor{var(--quarternary-color)}{\mathrm{diag}(\hat{u}_1^2,\dots,\hat{u}_N^2)}\textcolor{var(--tertiary-color)}{\boldsymbol{X}(\boldsymbol{X}'\boldsymbol{X})^{-1}}. \]
This estimator is sometimes called the sandwich estimator (see the image for an explanation).
Tests for Heteroskedasticity
We can test whether specific forms of heteroskedasticity are present (although we typically don’t know the exact form of the heteroskedasticity).
The first approach we discuss is the Breusch-Pagan test from Breusch & Pagan (1979). With this LM test, we check whether \(\sigma^2_i\) depends linearly on the regressors:
\[ \sigma^2_i = \delta_0 + \delta_1x_{i1} + \dots + \delta_Kx_{iK} + \text{error}. \]
The null hypothesis of the test is:
\[ H_0:\delta_1=\dots=\delta_K=0 \]
In large samples, the LM statistic of this test is \(\chi^2\)-distributed under the null hypothesis with \(K\) degrees of freedom.
We conduct the Breusch-Pagan test as follows:
The White test from White (1980) is a variant of the Breusch-Pagan test with a more flexible specification: it also includes all possible squared terms and interactions of the regressors. We conduct it as follows:
\[ \begin{aligned} \hat{u}_i^2=\delta_0+&\delta_1x_{i1}+\dots+x_{iK}+\\ &\delta_{K+1}x_{i1}^2+\dots+\delta_{2K}x_{iK}^2+\\ &\delta_{2K+1}x_{i1}x_{i2}+\dots+\delta_{(K(K+3)2)}x_{i,K-1}x_{iK}+\text{error}, \end{aligned} \]
This regression has \((K(K+3)2)\) regressors. That’s a lot of regressors. If \(K\) is large and \(N\) is small, it might even be too many.
An alternative version of the White test is:
\[ \hat{u}_i^2 = \delta_0 + \delta_1\hat{y}_i + \delta_2\hat{y}_i^2+\text{error}. \]
Assume we have heteroskedasticity, but we know the \(\sigma^2_i\). We want to estimate the following regression:
\[ y_i = \beta_0+\beta_1x_{i1}+\dots+\beta_Kx_{iK}+u_i, \]
but we know that OLS is inefficient.
However, with the error variances \(\sigma^2_i\), we can construct an efficient estimator. To do this, we divide the regression by \(\sigma_i=\sqrt{\sigma^2_i}\):
\[ \frac{y_i}{\sigma_i}=\beta_0\frac{1}{\sigma_i}+\beta_1\frac{x_{i1}}{\sigma_i}+\dots+\beta_K\frac{x_{iK}}{\sigma_i}+\frac{u_i}{\sigma_i} \]
Why do we do this? Because this way we scale the variance so that it is equal for all \(i\).
We weight observations with higher variance less than those with lower variance — hence the name weighted least squares. In matrix notation:
\[ \tilde{\boldsymbol{y}} = \tilde{\boldsymbol{X}}\boldsymbol{\beta}_{\mathrm{WLS}}+\tilde{\boldsymbol{u},} \]
where \(\tilde{\boldsymbol{y}}=\boldsymbol{\Omega}^{-1/2}\boldsymbol{y}\), \(\tilde{\boldsymbol{X}}=\boldsymbol{\Omega}^{-1/2}\boldsymbol{X}\), and \(\tilde{\boldsymbol{u}}=\boldsymbol{\Omega}^{-1/2}\boldsymbol{u}\); \(\boldsymbol{\Omega}=\mathrm{diag}(\sigma_1^2,\dots,\sigma_N^2)\).
The WLS estimator in this case is:
\[ \hat{\boldsymbol{\beta}}_{\mathrm{WLS}} = (\tilde{\boldsymbol{X}}'\tilde{\boldsymbol{X}})^{-1}\tilde{\boldsymbol{X}}'\tilde{\boldsymbol{y}}=(\boldsymbol{X}'\boldsymbol{\Omega}^{-1}\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{\Omega}^{-1}\boldsymbol{y}. \]
This WLS estimator is a special case of the generalized least squares estimator (GLS). GLS can be used with any variance-covariance matrix \(\boldsymbol{\Omega}\), not just the diagonal one above.
The variance of the WLS estimator is:
\[ \mathrm{Var}(\hat{\boldsymbol{\beta}}_{\mathrm{WLS}}\mid\boldsymbol{X})=(\tilde{\boldsymbol{X}}'\tilde{\boldsymbol{X}})^{-1} = (\boldsymbol{X}'\boldsymbol{\Omega}^{-1}\boldsymbol{X})^{-1} \]
We can estimate this variance using \(\hat{\boldsymbol{\Omega}}\). This allows us to obtain standard errors for tests. The variance of the WLS estimator is lower than that of the OLS estimator (proof omitted):
\[ \mathrm{Var}(\hat{\boldsymbol{\beta}}_{\mathrm{OLS}}\mid\boldsymbol{X})=(\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{\Omega}\boldsymbol{X} (\boldsymbol{X}'\boldsymbol{X})^{-1} \]
The problem is: We cannot estimate this. GLS (WLS) requires us to know the \(\sigma_i^2\), but we don’t.
If we want to apply feasible generalized least squares, we can proceed as follows:
One remaining problem: We don’t know the “true” functional form of the heteroskedasticity, we’ve only applied one possible form.
\[ \begin{aligned} \mathrm{Var}(\hat{\boldsymbol{\beta}}\mid\boldsymbol{X}) &= \mathrm{Var}\left((\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{y}\mid\boldsymbol{X}\right) \\ &= \mathrm{Var}\left((\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'(\boldsymbol{X}\boldsymbol{\beta}+\boldsymbol{u})\mid\boldsymbol{X}\right) \\ &= \mathrm{Var}\left((\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{X}\boldsymbol{\beta}+(\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{u}\mid\boldsymbol{X}\right) \\ &= \mathrm{Var}\left(\boldsymbol{\beta}+(\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{u}\mid\boldsymbol{X}\right) \\ &= \mathrm{Var}\left((\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{u}\mid\boldsymbol{X}\right) \\ &= (\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\mathrm{Var}(\boldsymbol{u}\mid\boldsymbol{X})\boldsymbol{X}(\boldsymbol{X}'\boldsymbol{X})^{-1} \\ &= (\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{\Omega}\boldsymbol{X}(\boldsymbol{X}'\boldsymbol{X})^{-1} \end{aligned} \]