Markov Chains & Processes
In the study of stochastic processes, the Markov Chain represents the most fundamental model for systems evolving over time where the future is conditionally independent of the past, given the present state. This property, known as the Markov Property, allows for the rigorous analysis of complex systems ranging from thermodynamics to financial modeling and web search algorithms.
1. The Markov Property
Let be a stochastic process taking values in a countable state space . The process is a Markov Chain if for all and all states :
If this probability is independent of , the chain is said to be time-homogeneous. We define the transition probability from state to state as:
The sequence of probabilities is thus entirely determined by the initial distribution and the transition probabilities.
2. The Transition Probability Matrix
For a finite state space , we can collect these probabilities into a matrix :
Properties of Stochastic Matrices
- Non-negativity: for all .
- Row Stochasticity: for all .
- This implies that the vector of ones is a right eigenvector of with eigenvalue : .
3. The Chapman-Kolmogorov Equations
To understand the evolution over multiple steps, we define the -step transition probability .
The Chapman-Kolmogorov equations state:
In matrix notation, this is elegantly expressed as:
Thus, the probability of being in state after steps, given an initial distribution vector , is:
4. State Classification
The long-term behavior of a Markov Chain depends on the structural relationships between its states.
Communication and Irreducibility
- Accessibility: State is accessible from () if for some .
- Communication: If and , we say and communicate (). Communication is an equivalence relation that partitions the state space into communicating classes.
- Irreducibility: A Markov Chain is irreducible if there is only one communicating class (i.e., every state is accessible from every other state).
Periodicity
The period of state is defined as: If , the state is aperiodic. In an irreducible chain, all states have the same period.
Recurrence vs. Transience
Let be the probability that, starting in state , the process ever returns to .
- Recurrent: . The process will return to infinitely many times.
- Transient: . There is a non-zero probability the process never returns.
A recurrent state is Positive Recurrent if the expected time to return is finite, and Null Recurrent otherwise (only possible in infinite state spaces).
5. Stationary Distribution
A probability distribution (represented as a row vector) is called a stationary distribution if:
This corresponds to a left eigenvector of associated with the eigenvalue .
Fundamental Theorem
For any irreducible and aperiodic (ergodic) Markov Chain:
- A unique stationary distribution exists.
- for all .
- The distribution of converges to regardless of the initial distribution.
6. Absorbing Markov Chains
A state is absorbing if . A chain is absorbing if it has at least one absorbing state and from every state it is possible to reach an absorbing state.
The transition matrix of an absorbing chain can be arranged in canonical form: where represents transitions between transient states. The Fundamental Matrix is: The entry represents the expected number of times the process stays in transient state given it started in transient state . The expected time to absorption starting from is the -th entry of the vector .
7. Applications: Google’s PageRank
The PageRank algorithm models a “random surfer” on the web. Let be the hyperlink graph. The transition matrix is defined where if a link exists. To ensure irreducibility and aperiodicity, a damping factor (typically 0.85) is introduced: where is an all-ones matrix. The PageRank vector is the stationary distribution of .
8. Computational Implementation
Below is a Python implementation to compute the stationary distribution of a Markov Chain using two methods: solving the linear system (eigenvector) and long-run simulation.
import numpy as np
from scipy import linalg
def compute_stationary(P):
"""
Computes the stationary distribution of a transition matrix P.
Method 1: Eigenvector decomposition.
Method 2: Power iteration (long-run simulation).
"""
n = P.shape[0]
# Method 1: Algebraic Solution
# We solve pi(P - I) = 0, which is (P^T - I)pi^T = 0
# Also we know sum(pi) = 1.
A = np.append(P.T - np.eye(n), [np.ones(n)], axis=0)
b = np.append(np.zeros(n), [1])
# Use least squares to solve the overdetermined system
pi_algebraic, _, _, _ = linalg.lstsq(A, b)
# Method 2: Power Iteration
# Repeatedly apply the transition matrix
pi_sim = np.ones(n) / n
for _ in range(1000):
prev_pi = pi_sim.copy()
pi_sim = pi_sim @ P
if np.allclose(pi_sim, prev_pi, atol=1e-10):
break
return pi_algebraic, pi_sim
# Example: 3-state system
# 0: Sunny, 1: Cloudy, 2: Rainy
P = np.array([
[0.7, 0.2, 0.1],
[0.3, 0.4, 0.3],
[0.2, 0.3, 0.5]
])
algebraic, simulated = compute_stationary(P)
print(f"Algebraic Pi: {algebraic}")
print(f"Simulated Pi: {simulated}")