Axiomatic Probability Theory

Probability theory is the mathematical framework for quantifying uncertainty. While intuitive and historical approaches (like frequency or subjective belief) exist, modern probability is built upon the rigorous foundations of measure theory, established by Andrey Kolmogorov in 1933.

1. The Kolmogorov Axioms

A probability model is defined by a probability space $(Ω, F, P)$ , where:

Sample Space ( $Ω$ ): The set of all possible outcomes of a random experiment.
$σ$ -algebra ( $F$ ): A collection of subsets of $Ω$ (called events) that is closed under complements and countable unions. Formally:
- $\emptyset \in F$ .
- If $A \in F$ , then $A^{c} \in F$ .
- If $A_{1}, A_{2}, \dots \in F$ , then $⋃_{i = 1}^{\infty} A_{i} \in F$ .
Probability Measure ( $P$ ): A function $P : F \to [0, 1]$ satisfying:
- Non-negativity: $P (A) \geq 0$ for all $A \in F$ .
- Normalization: $P (Ω) = 1$ .
- Countable Additivity: For any sequence of disjoint events $A_{1}, A_{2}, \dots$ , $P (⋃_{i = 1}^{\infty} A_{i}) = \sum_{i = 1}^{\infty} P (A_{i})$ .

From these axioms, we derive the fundamental properties of probability, such as $P (A^{c}) = 1 - P (A)$ and the inclusion-exclusion principle.

2. Conditional Probability and Bayes’ Theorem

For two events $A, B \in F$ with $P (B) > 0$ , the conditional probability of $A$ given $B$ is defined as: $P (A ∣ B) = \frac{P ( A \cap B )}{P ( B )}$

The Law of Total Probability

If ${B_{i}}_{i = 1}^{n}$ is a partition of $Ω$ , then for any event $A$ : $P (A) = \sum_{i = 1}^{n} P (A ∣ B_{i}) P (B_{i})$

Bayes’ Theorem

Bayes’ theorem allows us to invert conditional probabilities, forming the basis of Bayesian inference: $P (B_{j} ∣ A) = \frac{P ( A ∣ B _{j} ) P ( B _{j} )}{\sum _{i = 1}^{n} P ( A ∣ B _{i} ) P ( B _{i} )}$

3. Random Variables

A random variable $X$ is not a variable in the algebraic sense, but a measurable function $X : Ω \to R$ . This means that for every Borel set $B \in B (R)$ , the pre-image $X^{- 1} (B) = {ω \in Ω : X (ω) \in B}$ is an element of $F$ .

Discrete vs. Continuous Random Variables

Discrete: $X$ takes values in a countable set. Its distribution is described by a Probability Mass Function (PMF) $p_{X} (x) = P (X = x)$ .
Continuous: $X$ takes values in an uncountable set (usually $R$ ). Its distribution is described by a Probability Density Function (PDF) $f_{X} (x)$ such that: $P (a \leq X \leq b) = \int_{a}^{b} f_{X} (x) d x$ The Cumulative Distribution Function (CDF) is defined for all random variables as $F_{X} (x) = P (X \leq x)$ .

4. Expectation and Moments

The expected value $E [X]$ is the “center of mass” of the distribution. In measure-theoretic terms, it is the Lebesgue integral: $E [X] = \int_{Ω} X d P$ .

For discrete $X$ : $E [X] = \sum x p_{X} (x)$
For continuous $X$ : $E [X] = \int_{- \infty}^{\infty} x f_{X} (x) d x$

Properties of Expectation

Linearity: $E [a X + bY] = a E [X] + b E [Y]$ , regardless of independence.
Variance: $Var (X) = E [(X - E [X])^{2}] = E [X^{2}] - (E [X])^{2}$ .

Moment Generating Functions (MGF)

The MGF of a random variable $X$ is defined as: $M_{X} (t) = E [e^{tX}]$ If $M_{X} (t)$ exists in a neighborhood around $t = 0$ , it uniquely determines the distribution. Moments can be found by differentiating: $E [X^{n}] = M_{X}^{(n)} (0)$ .

The Characteristic Function $ϕ_{X} (t) = E [e^{i tX}]$ always exists for any distribution and is the Fourier transform of the density function.

5. Probability Inequalities

Inequalities provide upper bounds on the probability of “tail events.”

Markov’s Inequality: For a non-negative random variable $X$ and $a > 0$ : $P (X \geq a) \leq \frac{E [ X ]}{a}$
Chebyshev’s Inequality: For any $X$ with mean $μ$ and variance $σ^{2}$ : $P (∣ X - μ ∣ \geq kσ) \leq \frac{1}{k ^{2}}$
Jensen’s Inequality: For a convex function $ϕ$ : $ϕ (E [X]) \leq E [ϕ (X)]$

6. Limit Theorems

Limit theorems describe the behavior of the sum of independent and identically distributed (i.i.d.) random variables.

Law of Large Numbers (LLN)

Let $X_{1}, X_{2}, \dots$ be i.i.d. random variables with $E [X_{i}] = μ$ . Let $\overset{ˉ}{X}_{n} = \frac{1}{n} \sum_{i = 1}^{n} X_{i}$ .

Weak Law (WLLN): $\overset{ˉ}{X}_{n} P μ$ (Convergence in probability).
Strong Law (SLLN): $\overset{ˉ}{X}_{n} a . s . μ$ (Almost sure convergence).

Central Limit Theorem (CLT)

Let $X_{1}, X_{2}, \dots$ be i.i.d. with mean $μ$ and finite variance $σ^{2}$ . As $n \to \infty$ : $\frac{X ˉ _{n} - μ}{σ / n} d N (0, 1)$ This explains why the Normal distribution appears everywhere in nature: it represents the aggregate effect of many independent small fluctuations.

Python Simulation: The Central Limit Theorem

The following script simulates the CLT by summing variables from a non-normal distribution (Uniform) and showing the resulting distribution of the mean.

import numpy as np
import matplotlib.pyplot as plt

def simulate_clt(sample_size, num_simulations):
    # Generating samples from a Uniform(0, 1) distribution
    # Mean = 0.5, Variance = 1/12
    means = []
    for _ in range(num_simulations):
        data = np.random.uniform(0, 1, sample_size)
        means.append(np.mean(data))
    
    plt.figure(figsize=(10, 6))
    plt.hist(means, bins=50, density=True, alpha=0.7, color='steelblue')
    
    # Overlay the theoretical Normal distribution
    mu = 0.5
    sigma = np.sqrt(1/12) / np.sqrt(sample_size)
    x = np.linspace(min(means), max(means), 100)
    p = (1 / (sigma * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((x - mu) / sigma)**2)
    plt.plot(x, p, 'r', linewidth=2)
    
    plt.title(f"CLT Simulation: n={sample_size}")
    plt.xlabel("Sample Mean")
    plt.ylabel("Density")
    plt.show()

simulate_clt(sample_size=30, num_simulations=10000)

Conceptual Check

Suppose the MGF of a random variable X is M_X(t) = exp(3t + 8t^2). What is the expectation E[X] and variance Var(X)?

Conceptual Check

A rare disease affects 0.1% of the population. A test for it has a 99% true positive rate and a 5% false positive rate. If a person tests positive, what is the probability they actually have the disease?

Conceptual Check

Axiomatic Probability Theory

Axiomatic Probability Theory

1. The Kolmogorov Axioms

2. Conditional Probability and Bayes’ Theorem

The Law of Total Probability

Bayes’ Theorem

3. Random Variables

Discrete vs. Continuous Random Variables

4. Expectation and Moments

Properties of Expectation

Moment Generating Functions (MGF)

5. Probability Inequalities

6. Limit Theorems

Law of Large Numbers (LLN)

Central Limit Theorem (CLT)

Python Simulation: The Central Limit Theorem

Suppose the MGF of a random variable X is M_X(t) = exp(3t + 8t^2). What is the expectation E[X] and variance Var(X)?

A rare disease affects 0.1% of the population. A test for it has a 99% true positive rate and a 5% false positive rate. If a person tests positive, what is the probability they actually have the disease?

According to Chebyshev's inequality, what is the minimum probability that a random variable falls within 3 standard deviations of its mean?