📘 Likelihood-Based Estimation Methods: An Enhanced Guide

Core Idea: The Engine of Likelihood

At its heart, a likelihood method asks a simple question: "Given the data I observed, which parameter values for my model make this data most probable?"

The Likelihood Function (\( L(\theta | \text{data}) \)): This is the answer to that question. It's a function of the parameters (\( \theta \)), not the data. For independent data, it's the product of the probability (density) functions for each data point.

The Log-Likelihood (\( \ell(\theta) \)): We almost always work with the log-likelihood because it turns pesky products into manageable sums. Since the logarithm is a monotonic function, maximizing the log-likelihood gives the same answer as maximizing the likelihood.

\[ \ell(\theta) = \sum \log( f(y_i | x_i, \theta) ) \]

1. Full Likelihood (Parametric MLE) - "The Purist"

Philosophy: "I am willing to assume a specific, full probability distribution for my data (e.g., Normal, Binomial, Poisson). I will find the parameters that make this chosen distribution best fit the data."

Key Properties:

Methods:

Example: You assume your data \( y \) is Normally distributed. You use MLE to find \( \mu \) and \( \sigma^2 \). Your log-likelihood function is:

\[ \ell(\mu, \sigma^2) = - \frac{n}{2} \log(2 \pi \sigma^2) - \frac{1}{2 \sigma^2} \sum (y_i - \mu)^2 \]

2. Quasi-Likelihood / Pseudo-Likelihood - "The Pragmatist"

Philosophy: "I don't want to assume the full distribution. I only want to correctly specify the relationship for the mean (and maybe the variance). I'll use a 'likelihood-like' function that gives me good, robust estimates anyway."

Key Properties:

Methods:

Example: You model count data with a Poisson regression (which assumes mean = variance). Your data is counts of insects on leaves, but the counts are more variable than expected. Instead of a complex model, you use Quasi-Poisson regression. You still model \( \log(\text{mean}) = \beta_0 + \beta_1 x \), but the model estimates a dispersion parameter to inflate the standard errors, making your inference reliable.

3. Semiparametric Likelihood - "The Balanced Approach"

Philosophy: "I will carefully model the part of the system I care about (usually the effect of covariates), but I will leave the annoying nuisance parts (like the baseline hazard or error distribution) completely unspecified to avoid making bad assumptions."

Key Properties:

Methods:

Example: Studying the effect of a new drug on patient survival time. You use a Cox model. The model tells you that the drug reduces the hazard of death by 50% (a precise, interpretable effect size), without you ever having to model the complex pattern of survival times for all patients.

4. Likelihood-Based Inference - "Making Decisions"

Once you have estimates from maximizing a (log-)likelihood, you need to perform inference.

📌 Hierarchy Mnemonic & Practical Guide

Method When to Use It Key Question to Ask Real-World Analogy
Full MLE You are confident in the data's distribution. Efficiency is key. "Am I willing to bet that the errors are exactly Normal?" Using a precise recipe from a renowned chef. Best results if followed exactly.
Quasi-MLE Your focus is on the mean trend. The full distribution is messy. "Is my data overdispersed? Or do I just care about getting the trend right?" Following the main steps of a recipe but tweaking spices to your taste. Still makes a great dish.
Semiparametric You care about specific effects (e.g., treatment) but not underlying shapes. "Do I want to avoid assuming a shape for the baseline hazard or error distribution?" Buying a perfectly tailored jacket (the covariate effect) without worrying about how the base fabric was woven (the nuisance parameter).