Axiomatic Probability Theory
Probability theory is the mathematical framework for quantifying uncertainty. While intuitive and historical approaches (like frequency or subjective belief) exist, modern probability is built upon the rigorous foundations of measure theory, established by Andrey Kolmogorov in 1933.
1. The Kolmogorov Axioms
A probability model is defined by a probability space , where:
- Sample Space (): The set of all possible outcomes of a random experiment.
- -algebra (): A collection of subsets of (called events) that is closed under complements and countable unions. Formally:
- .
- If , then .
- If , then .
- Probability Measure (): A function satisfying:
- Non-negativity: for all .
- Normalization: .
- Countable Additivity: For any sequence of disjoint events , .
From these axioms, we derive the fundamental properties of probability, such as and the inclusion-exclusion principle.
2. Conditional Probability and Bayes’ Theorem
For two events with , the conditional probability of given is defined as:
The Law of Total Probability
If is a partition of , then for any event :
Bayes’ Theorem
Bayes’ theorem allows us to invert conditional probabilities, forming the basis of Bayesian inference:
3. Random Variables
A random variable is not a variable in the algebraic sense, but a measurable function . This means that for every Borel set , the pre-image is an element of .
Discrete vs. Continuous Random Variables
- Discrete: takes values in a countable set. Its distribution is described by a Probability Mass Function (PMF) .
- Continuous: takes values in an uncountable set (usually ). Its distribution is described by a Probability Density Function (PDF) such that: The Cumulative Distribution Function (CDF) is defined for all random variables as .
4. Expectation and Moments
The expected value is the “center of mass” of the distribution. In measure-theoretic terms, it is the Lebesgue integral: .
- For discrete :
- For continuous :
Properties of Expectation
- Linearity: , regardless of independence.
- Variance: .
Moment Generating Functions (MGF)
The MGF of a random variable is defined as: If exists in a neighborhood around , it uniquely determines the distribution. Moments can be found by differentiating: .
The Characteristic Function always exists for any distribution and is the Fourier transform of the density function.
5. Probability Inequalities
Inequalities provide upper bounds on the probability of “tail events.”
- Markov’s Inequality: For a non-negative random variable and :
- Chebyshev’s Inequality: For any with mean and variance :
- Jensen’s Inequality: For a convex function :
6. Limit Theorems
Limit theorems describe the behavior of the sum of independent and identically distributed (i.i.d.) random variables.
Law of Large Numbers (LLN)
Let be i.i.d. random variables with . Let .
- Weak Law (WLLN): (Convergence in probability).
- Strong Law (SLLN): (Almost sure convergence).
Central Limit Theorem (CLT)
Let be i.i.d. with mean and finite variance . As : This explains why the Normal distribution appears everywhere in nature: it represents the aggregate effect of many independent small fluctuations.
Python Simulation: The Central Limit Theorem
The following script simulates the CLT by summing variables from a non-normal distribution (Uniform) and showing the resulting distribution of the mean.
import numpy as np
import matplotlib.pyplot as plt
def simulate_clt(sample_size, num_simulations):
# Generating samples from a Uniform(0, 1) distribution
# Mean = 0.5, Variance = 1/12
means = []
for _ in range(num_simulations):
data = np.random.uniform(0, 1, sample_size)
means.append(np.mean(data))
plt.figure(figsize=(10, 6))
plt.hist(means, bins=50, density=True, alpha=0.7, color='steelblue')
# Overlay the theoretical Normal distribution
mu = 0.5
sigma = np.sqrt(1/12) / np.sqrt(sample_size)
x = np.linspace(min(means), max(means), 100)
p = (1 / (sigma * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((x - mu) / sigma)**2)
plt.plot(x, p, 'r', linewidth=2)
plt.title(f"CLT Simulation: n={sample_size}")
plt.xlabel("Sample Mean")
plt.ylabel("Density")
plt.show()
simulate_clt(sample_size=30, num_simulations=10000)