Overview of Least Squares Estimation Methods

2. Main Variants

1. OLS – Ordinary Least Squares

Model: \( y = X \beta + \epsilon \), where \( \epsilon \) is the error term.

Estimator: \( \hat{\beta} = \arg \min_{\beta} \sum (y_i - x_i' \beta)^2 = (X'X)^{-1} X' y \) (closed-form solution).

Key Assumptions (The "Classical" Assumptions):

Linearity: The relationship between \( X \) and \( y \) is linear.
Exogeneity: The error term has a mean of zero conditional on the regressors (\( E[\epsilon | X] = 0 \)). This means \( X \) is not correlated with the error.
Homoskedasticity: The error term has constant variance (\( \text{Var}(\epsilon | X) = \sigma^2 I \)).
No Autocorrelation: Errors are uncorrelated with each other.

Properties: Under these assumptions, the Gauss-Markov Theorem holds: OLS is the Best Linear Unbiased Estimator (BLUE). It has the smallest variance among all unbiased linear estimators.

Use Case: The standard starting point for any linear regression analysis.

Example: Predicting house prices (\( y \)) based on square footage and number of bedrooms (\( X \)). We assume the variability in price is roughly the same for small and large houses (homoskedasticity).

2. WLS – Weighted Least Squares

Model: Same as OLS, but errors are heteroskedastic (non-constant variance).

Estimator: \( \hat{\beta} = \arg \min_{\beta} \sum w_i (y_i - x_i' \beta)^2 \). The weights are typically chosen as \( w_i = 1 / \sigma_i^2 \), where \( \sigma_i^2 \) is the variance of the error for the \( i \)-th observation.

Idea: "Down-weight" observations that are known to be noisier (high variance) and "up-weight" observations that are more precise (low variance). This restores efficiency.

Properties: More efficient than OLS when the weights are correctly specified. If weights are wrong, it can be worse than OLS.

Use Case: Data where the reliability of observations varies.

Example:

Survey Data: If some data points are averages from a survey of 1000 people (precise, low variance) and others are from a survey of 10 people (imprecise, high variance), WLS would assign higher weight to the larger survey data.
Finance: Modeling stock returns where volatility (variance) is known to be higher for small-cap stocks than for large-cap stocks.

3. GLS – Generalized Least Squares

Model: \( y = X \beta + u \), where \( \text{Var}(u) = \sigma^2 \Omega \). \( \Omega \) is a known positive-definite covariance matrix that captures the structure of the heteroskedasticity and autocorrelation.

Estimator: \( \hat{\beta}_{GLS} = (X' \Omega^{-1} X)^{-1} X' \Omega^{-1} y \). This "transforms" the original model to one that satisfies OLS assumptions.

Idea: The most general case for handling any violation of the spherical errors assumption (homoskedasticity + no correlation). It simultaneously corrects for both heteroskedasticity and autocorrelation.

Special Cases & Intuition:

If \( \Omega = I \) (identity matrix), errors are spherical, and \( \hat{\beta}_{GLS} = \hat{\beta}_{OLS} \).
If \( \Omega \) is a diagonal matrix (with different values), errors are heteroskedastic but uncorrelated, and GLS reduces to WLS.
If \( \Omega \) is not diagonal, it corrects for correlated errors (autocorrelation).

Use Case: Time-series regressions, spatial econometrics, panel data models.

Example: Modeling economic GDP growth over time. Error terms are likely autocorrelated (a shock this year affects next year). \( \Omega \) would have high values on its main diagonal and non-zero values on the off-diagonals to represent this correlation structure.

4. NLS – Nonlinear Least Squares

Model: \( y_i = f(x_i, \theta) + \epsilon_i \), where \( f(\cdot) \) is a nonlinear function of the parameters \( \theta \) (e.g., \( \theta_1 e^{\theta_2 x} \)).

Estimator: \( \hat{\theta} = \arg \min_{\theta} \sum (y_i - f(x_i, \theta))^2 \).

Key Difference: There is no closed-form solution like \( (X'X)^{-1} X' y \). Estimation requires iterative numerical optimization algorithms (e.g., Gradient Descent, Gauss-Newton).

Properties: Under standard regularity conditions, the estimator is consistent and asymptotically normal. However, it can be sensitive to starting values and may converge to a local (not global) minimum.

Use Case: Any context where the underlying data-generating process is known to be nonlinear.

Examples:

Biology: Modeling population growth over time using a logistic function: \( f(t, \theta) = \frac{\theta_1}{1 + e^{-(\theta_2 t + \theta_3)}} \).
Pharmacokinetics: Modeling the concentration of a drug in the bloodstream over time: \( f(t, \theta) = \theta_1 (e^{-\theta_2 t} - e^{-\theta_3 t}) \).
Physics: Modeling the decay of a radioactive isotope: \( f(t, \theta) = \theta_1 e^{-\theta_2 t} \).

Method	Primary Use Case	Key Assumption
OLS	Baseline modeling. Standard linear relationships.	Spherical errors (homoskedastic, uncorrelated).
WLS	Heteroskedastic data with known or estimable variances.	The chosen weights are (inversely) proportional to the error variance.
GLS/FGLS	Correlated errors (time-series, panels) or complex heteroskedasticity.	The structure of the error covariance (\( \Omega \)) can be correctly specified/estimated.
NLS	Theoretical model is inherently nonlinear in its parameters.	The functional form \( f(x_i, \theta) \) is correctly specified.

📘 Overview of Least Squares Estimation Methods

1. Core Idea

2. Main Variants

1. OLS – Ordinary Least Squares

2. WLS – Weighted Least Squares

3. GLS – Generalized Least Squares

4. NLS – Nonlinear Least Squares

3. Relationships and a Practical Challenge

4. When to Use: A Decision Guide