All least squares methods are founded on a simple, powerful principle: find the model parameters that make the predicted values as close as possible to the observed values. They achieve this by minimizing the sum of squared residuals (the differences between observed and predicted values). Squaring the residuals ensures both positive and negative errors are penalized and emphasizes larger errors.
General Mathematical Form:
Breaking it down:
Model: \( y = X \beta + \epsilon \), where \( \epsilon \) is the error term.
Estimator: \( \hat{\beta} = \arg \min_{\beta} \sum (y_i - x_i' \beta)^2 = (X'X)^{-1} X' y \) (closed-form solution).
Key Assumptions (The "Classical" Assumptions):
Properties: Under these assumptions, the Gauss-Markov Theorem holds: OLS is the Best Linear Unbiased Estimator (BLUE). It has the smallest variance among all unbiased linear estimators.
Use Case: The standard starting point for any linear regression analysis.
Example: Predicting house prices (\( y \)) based on square footage and number of bedrooms (\( X \)). We assume the variability in price is roughly the same for small and large houses (homoskedasticity).
Model: Same as OLS, but errors are heteroskedastic (non-constant variance).
Estimator: \( \hat{\beta} = \arg \min_{\beta} \sum w_i (y_i - x_i' \beta)^2 \). The weights are typically chosen as \( w_i = 1 / \sigma_i^2 \), where \( \sigma_i^2 \) is the variance of the error for the \( i \)-th observation.
Idea: "Down-weight" observations that are known to be noisier (high variance) and "up-weight" observations that are more precise (low variance). This restores efficiency.
Properties: More efficient than OLS when the weights are correctly specified. If weights are wrong, it can be worse than OLS.
Use Case: Data where the reliability of observations varies.
Example:
Model: \( y = X \beta + u \), where \( \text{Var}(u) = \sigma^2 \Omega \). \( \Omega \) is a known positive-definite covariance matrix that captures the structure of the heteroskedasticity and autocorrelation.
Estimator: \( \hat{\beta}_{GLS} = (X' \Omega^{-1} X)^{-1} X' \Omega^{-1} y \). This "transforms" the original model to one that satisfies OLS assumptions.
Idea: The most general case for handling any violation of the spherical errors assumption (homoskedasticity + no correlation). It simultaneously corrects for both heteroskedasticity and autocorrelation.
Special Cases & Intuition:
Use Case: Time-series regressions, spatial econometrics, panel data models.
Example: Modeling economic GDP growth over time. Error terms are likely autocorrelated (a shock this year affects next year). \( \Omega \) would have high values on its main diagonal and non-zero values on the off-diagonals to represent this correlation structure.
Model: \( y_i = f(x_i, \theta) + \epsilon_i \), where \( f(\cdot) \) is a nonlinear function of the parameters \( \theta \) (e.g., \( \theta_1 e^{\theta_2 x} \)).
Estimator: \( \hat{\theta} = \arg \min_{\theta} \sum (y_i - f(x_i, \theta))^2 \).
Key Difference: There is no closed-form solution like \( (X'X)^{-1} X' y \). Estimation requires iterative numerical optimization algorithms (e.g., Gradient Descent, Gauss-Newton).
Properties: Under standard regularity conditions, the estimator is consistent and asymptotically normal. However, it can be sensitive to starting values and may converge to a local (not global) minimum.
Use Case: Any context where the underlying data-generating process is known to be nonlinear.
Examples:
Theoretical Hierarchy: OLS ⊂ WLS ⊂ GLS. GLS is the most general form for linear models with generalized error structures. NLS is a separate branch for nonlinearity.
The Practical Problem: In practice, the true error covariance matrix \( \Omega \) for GLS is almost never known.
The Solution: Feasible GLS (FGLS):
Method | Primary Use Case | Key Assumption |
---|---|---|
OLS | Baseline modeling. Standard linear relationships. | Spherical errors (homoskedastic, uncorrelated). |
WLS | Heteroskedastic data with known or estimable variances. | The chosen weights are (inversely) proportional to the error variance. |
GLS/FGLS | Correlated errors (time-series, panels) or complex heteroskedasticity. | The structure of the error covariance (\( \Omega \)) can be correctly specified/estimated. |
NLS | Theoretical model is inherently nonlinear in its parameters. | The functional form \( f(x_i, \theta) \) is correctly specified. |
Practical Workflow: