Probability as Logic

ABSTRACT This overview argues that probability is not a physical property of objects but the unique consistent extension of classical logic to degrees of plausibility. Beginning with the Mind Projection Fallacy and the Robot normative construct, it proceeds through the conditionality of all probabilities, the Maximum Entropy principle for prior assignment, and concludes that the Frequentist–Bayesian debate dissolves when viewed from this higher vantage — frequentist methods emerging as well-posed special cases of the broader logical framework. Where the original synthesis risks overreach, honest qualifications are noted.

§ 01 The Core Definition: Probability as Extended Logic

Classical binary logic assigns to every proposition a value drawn from the set {0, 1} — False or True. This is an idealization suited to deductive certainty. The actual world, however, almost never provides the information required to achieve such certainty. Probability is the mathematical instrument for reasoning honestly in its absence.

In this framework, probability is not a counting operation performed on a frequency table, nor a quirk of quantum mechanics, nor a psychological disposition. It is a numerical measure of plausibility — the degree to which a body of available information rationally licenses a proposition. This makes probability a property of the epistemic situation, not of the object under study.

CORE PRINCIPLE — THE MIND PROJECTION FALLACY

The error of treating an epistemic state as an ontological one. A shuffled deck of cards is not "physically random": its order is fixed and deterministic. "Randomness" is the name we give to our own ignorance of that order. To say the deck is random is to make a claim about ourselves, not the deck. Conflating the two produces systematic confusion in probabilistic reasoning.

This insight generalizes broadly. When we say a stock has a "30% chance of rising," we are not measuring a physical propensity in the stock. We are summarizing everything we know — earnings, macro regime, order flow, positioning — into a single numerical credential for the proposition "the stock rises." Change the information; the probability must change. This is not subjectivity. It is logical necessity.

NORMATIVE CONSTRUCT — THE ROBOT

To reason rigorously, we adopt the standard of a hypothetical robot governed by exactly two rules: (1) Consistency — if a conclusion can be reached by multiple paths, every path must arrive at the same answer; and (2) Honesty — the robot uses all available information and never invokes information it does not possess. This is the normative standard of rationality: not a description of how humans reason, but a precise specification of how they ought to reason under uncertainty.

The robot construct is not emotionally neutral by accident — it is constructed to be so by design. Its purpose is to provide a fixed logical reference against which human reasoning can be evaluated and corrected. The Cox–Jaynes derivation shows that any system satisfying these two axioms must obey the standard rules of probability calculus. The rules are not postulated; they are derived.

§ 02 The Golden Rule: All Probabilities are Conditional

Perhaps the single most important and most frequently violated principle in applied reasoning: there is no such thing as an unconditional probability. Every probability is written, in full, as \(P(A \mid I)\), where \(A\) is the proposition and \(I\) is the totality of background information conditioning the assignment.

When we write \(P(A)\), we are merely suppressing the conditioning for notational convenience. The suppression is always a potential source of error. When two analysts disagree about a probability, the disagreement is almost always traceable to a difference in \(I\) — they are reasoning from different information sets, not making a logical error per se.

\[ P(H \mid D, I) \;=\; \frac{P(D \mid H, I)\; P(H \mid I)}{P(D \mid I)} \] Bayes' Theorem in full — the logical law governing rational belief revision upon receiving new data \(D\), given background \(I\).

Bayes' Theorem, so expressed, is not a statistical technique. It is a theorem of probability calculus with the same logical status as the rules of deduction. It tells us exactly how our credence in a hypothesis \(H\) must change when data \(D\) arrives:

The prior \(P(H \mid I)\) encodes everything we knew before the data — it must be stated, not hidden.
The likelihood \(P(D \mid H, I)\) asks: how probable is this data, if the hypothesis is true?
The evidence \(P(D \mid I)\) is a normalizing constant ensuring the posterior is a proper distribution.
The posterior \(P(H \mid D, I)\) is our new rational credence — the mandatory output of applying the rule.

HONEST QUALIFICATION

Bayes' Theorem is universally valid as a logical identity. However, the practical difficulty — and a legitimate critique of naive Bayesianism — is that specifying the prior \(P(H \mid I)\) and the likelihood function is often non-trivial. The framework tells us we must have a prior; it does not always tell us which one. Section 3 addresses this directly.

One under-appreciated consequence of this principle is the symmetry of inference across logical time. Bayes' Theorem operates with equal validity in the forward direction (predicting future data from a hypothesis) and the backward direction (inferring the cause of observed data). The asymmetry we experience in time is a feature of the physical world, not of the logic of inference.

§ 03 The Maximum Entropy Principle: Choosing the Least Assuming Prior

Every application of Bayes' Theorem requires a prior. The question of how to assign one when information is sparse is not a minor technical detail — it is the deepest problem in the foundations of inference. Jaynes' answer is the Principle of Maximum Entropy (MaxEnt).

The principle states: given a set of known constraints on a probability distribution (moments, bounds, symmetries), assign the distribution with the highest Shannon entropy consistent with those constraints. Formally, maximize:

\[ H[p] \;=\; -\sum_i p_i \log p_i \quad \text{subject to constraints} \quad \sum_i p_i f_k(x_i) = \langle f_k \rangle \] The MaxEnt variational problem. Maximize informational entropy subject to whatever you actually know.

The philosophical justification is precise: maximum entropy is the unique distribution that encodes the given constraints and nothing more. Any other choice smuggles in hidden information — it asserts structure the reasoner does not, in fact, possess. MaxEnt is therefore the honest prior.

Known Constraint	MaxEnt Distribution	Interpretation
Finite support, no other info	Uniform	All outcomes equally plausible — the classical "Principle of Indifference"
Known mean \(\mu\) on \([0,\infty)\)	Exponential(\(\mu\))	Most spread-out distribution consistent with a specified average
Known mean \(\mu\) and variance \(\sigma^2\)	Gaussian\((\mu, \sigma^2)\)	The bell curve is not assumed — it is derived from two constraints
Known mean on \([0,1]\)	Beta distribution	Natural prior for probability-of-probability problems

DEEP CONSEQUENCE — THE GAUSSIAN AS DERIVED, NOT ASSUMED

The ubiquity of the normal distribution in nature is no mystery under MaxEnt: whenever the only constraints on a continuous distribution are a finite mean and variance, the Gaussian is the unique maximally honest representation of that ignorance. The Central Limit Theorem is a frequentist path to the same object. Both roads converge.

MaxEnt generalizes naturally to the Principle of Maximum Relative Entropy (also called MinXEnt or the Kullback–Leibler framework), which handles updating when a prior already exists: minimize the information gain — the KL divergence from the prior — subject to the new constraints. This is Bayesian updating derived from an information-theoretic starting point.

§ 04 Resolving the Frequentist–Bayesian Divide

For most of the twentieth century, statistics was divided into two hostile camps. The schism was, in retrospect, largely a consequence of both sides operating at insufficient generality. Viewed from the Jaynesian framework, the conflict dissolves rather than gets resolved — because one position subsumes the other.

FREQUENTISM

Probability is defined as the limiting relative frequency of an outcome in an infinite sequence of independent, identical trials. Probability is therefore only meaningful for repeatable experiments. Parameters are fixed but unknown; only data are random. Inference proceeds via sampling distributions, p-values, and confidence intervals.

LOGICAL / BAYESIAN

Probability is a measure of rational credence, applicable to any well-defined proposition — including one-time events, model parameters, and causal hypotheses. Both data and parameters are treated as random variables relative to the state of information. Inference proceeds via Bayes' Theorem; uncertainty is expressed as posterior distributions.

The key mathematical result is that frequentist methods are recovered as limiting cases of Bayesian inference under specific, well-defined conditions — typically exchangeability of observations, large-sample limits, and particular choices of prior (often uninformative or reference priors). The frequentist sampling distribution of a maximum-likelihood estimator, for example, is what a Bayesian posterior looks like in the limit of a flat prior and infinite data.

This is not merely a claim that the numbers agree. It is a deeper claim: frequentist procedures are well-calibrated exactly when and because they can be derived from a coherent Bayesian foundation. When they cannot be so derived, they tend to produce pathologies — confidence intervals that contain the true parameter with the right long-run frequency but have zero probability of containing it in a specific realized case, for instance.

WHERE JAYNES OVERSTATES — AN HONEST ASSESSMENT

The claim that frequentism is merely "proscribed" Bayesianism somewhat understates a legitimate frequentist virtue: making weaker assumptions is sometimes exactly right. Design-based inference in survey sampling, permutation tests in clinical trials, and fiducial methods each carry genuine robustness properties that do not reduce cleanly to Jaynes' picture. The Bayesian framework is more general; it is not always more appropriate.

§ 05 Information Theory: The Deeper Foundation

Shannon's information theory and Jaynes' probability framework are not merely compatible — they are expressions of the same underlying structure, arrived at from different directions. Shannon asked: what is the minimum number of bits required to transmit a message from a source with known probability distribution \(p\)? The answer is the entropy \(H[p] = -\sum p_i \log p_i\). Jaynes asked: what distribution over an unknown source should a rational agent assign? The answer, given only a known entropy constraint, is found by maximizing — the same function.

This convergence is not coincidental. It reflects the fact that probability, information, and logic are three aspects of the same underlying theory of consistent reasoning under uncertainty.

Entropy measures the irreducible uncertainty in a probability assignment — the expected information gain from observing an outcome.
KL Divergence \(D_{KL}(P \| Q)\) measures the information cost of using distribution \(Q\) when \(P\) is the truth — a natural loss function for inference.
Mutual Information \(I(X;Y)\) measures how much knowing \(Y\) reduces uncertainty about \(X\) — the formal version of "relevance."
The Data Processing Inequality formalizes that you cannot extract more information from processed data than was present in the raw data — a theorem with deep implications for inference chains.

PRACTICAL CONSEQUENCE FOR ANALYSTS

Every model compression, every feature selection, every regularization technique in machine learning can be understood as a structured application of information-theoretic principles. The analyst who understands this foundation is not merely choosing tools by convention — they are reasoning about what their choices imply about their epistemic commitments.

§ 06 Five Operational Principles

The preceding framework, properly internalized, is not merely philosophical. It changes how one acts as an analyst, investor, or scientist. The following five principles translate the logical framework into operational discipline:

State your priors explicitly. Every probability claim rests on background assumptions. Surface them. An analyst who cannot articulate their prior has not eliminated it — they have merely hidden it, and hidden priors cannot be examined, challenged, or updated.
Update on all evidence, not just confirming evidence. Bayesian reasoning is symmetric: data that confirms a hypothesis raises its posterior; data that disconfirms it must lower it. Selective updating is not caution — it is a logical error that compounds over time.
Use MaxEnt to construct priors when information is sparse. Do not reach for a distribution out of habit or computational convenience. Ask what you actually know — bounds, moments, symmetries — and derive the most honest distribution from those constraints.
Never project epistemic states onto objects. When you say a trade is "risky" or a merger is "unlikely," you are making a statement about your information state, not the trade or the merger. Distinguish these categories. The confusion generates overconfidence, narrative bias, and anchoring.
Demand interpretability of uncertainty. A probability interval that cannot be interpreted as a direct credence in the proposition — that only makes claims about hypothetical long-run behavior — is operationally inferior for single-instance decisions. Prefer posterior credible intervals over classical confidence intervals wherever the decision horizon is finite.

"Probability is not a physical property of objects — it is the rigorous language of honest ignorance. To use it correctly is to be precisely right about what you do not know."

— Synthesis of Jaynes · Probability Theory: The Logic of Science

§ 07 Summary: The Architecture of the Framework

Thesis	Core Claim	Verdict
Probability as Logic	Probability is extended logic — a measure of plausible reasoning, not a physical property	Robustly correct; contested only at quantum foundations
Mind Projection Fallacy	Randomness is epistemic, not ontic; the "robot" is the normative standard	Correct as a diagnostic principle; robot axioms are chosen, not uniquely derivable
Conditionality	All probabilities are conditional; Bayes' Theorem is a logical law, not a technique	Most robustly defensible thesis in the framework
MaxEnt Prior	Assign the distribution of maximum entropy consistent with known constraints	Correct as a uniqueness result; choice of constraints remains the analyst's responsibility
Frequentism as Special Case	Frequentist methods are limiting cases of Bayesian inference under symmetry conditions	Substantially correct mathematically; somewhat polemical as a philosophical claim

This overview synthesizes E.T. Jaynes, Probability Theory: The Logic of Science (2003) with Shannon information theory and subsequent developments in Bayesian epistemology. Qualifications reflect ongoing debates in quantum foundations, non-parametric statistics, and the philosophy of science. The framework is presented as the most coherent available foundation — not as a closed system.