Markov Chains & Processes

In the study of stochastic processes, the Markov Chain represents the most fundamental model for systems evolving over time where the future is conditionally independent of the past, given the present state. This property, known as the Markov Property, allows for the rigorous analysis of complex systems ranging from thermodynamics to financial modeling and web search algorithms.

1. The Markov Property

Let ${X_{n}, n \geq 0}$ be a stochastic process taking values in a countable state space $S$ . The process is a Markov Chain if for all $n \geq 0$ and all states $i_{0}, i_{1}, \dots, i_{n}, j \in S$ :

$P (X_{n + 1} = j ∣ X_{n} = i_{n}, X_{n - 1} = i_{n - 1}, \dots, X_{0} = i_{0}) = P (X_{n + 1} = j ∣ X_{n} = i_{n})$

If this probability is independent of $n$ , the chain is said to be time-homogeneous. We define the transition probability from state $i$ to state $j$ as:

$P_{ij} = P (X_{n + 1} = j ∣ X_{n} = i)$

The sequence of probabilities ${X_{0}, X_{1}, \dots}$ is thus entirely determined by the initial distribution $ν (i) = P (X_{0} = i)$ and the transition probabilities.

2. The Transition Probability Matrix

For a finite state space $S = {1, 2, \dots, k}$ , we can collect these probabilities into a $k \times k$ matrix $P$ :

$P = P_{11} P_{21} ⋮ P_{k 1} P_{12} P_{22} ⋮ P_{k 2} \dots \dots ⋱ \dots P_{1 k} P_{2 k} ⋮ P_{kk}$

Properties of Stochastic Matrices

Non-negativity: $P_{ij} \geq 0$ for all $i, j \in S$ .
Row Stochasticity: $\sum_{j \in S} P_{ij} = 1$ for all $i \in S$ .
- This implies that the vector of ones $1$ is a right eigenvector of $P$ with eigenvalue $λ = 1$ : $P 1 = 1$ .

3. The Chapman-Kolmogorov Equations

To understand the evolution over multiple steps, we define the $n$ -step transition probability $P_{ij}^{(n)} = P (X_{n} = j ∣ X_{0} = i)$ .

The Chapman-Kolmogorov equations state: $P_{ij}^{(m + n)} = \sum_{k \in S} P_{ik}^{(m)} P_{kj}^{(n)}$

In matrix notation, this is elegantly expressed as: $P^{(n)} = P^{n}$

Thus, the probability of being in state $j$ after $n$ steps, given an initial distribution vector $π^{(0)}$ , is: $π^{(n)} = π^{(0)} P^{n}$

4. State Classification

The long-term behavior of a Markov Chain depends on the structural relationships between its states.

Communication and Irreducibility

Accessibility: State $j$ is accessible from $i$ ( $i \to j$ ) if $P_{ij}^{(n)} > 0$ for some $n \geq 0$ .
Communication: If $i \to j$ and $j \to i$ , we say $i$ and $j$ communicate ( $i \leftrightarrow j$ ). Communication is an equivalence relation that partitions the state space into communicating classes.
Irreducibility: A Markov Chain is irreducible if there is only one communicating class (i.e., every state is accessible from every other state).

Periodicity

The period $d (i)$ of state $i$ is defined as: $d (i) = g cd {n > 0 : P_{ii}^{(n)} > 0}$ If $d (i) = 1$ , the state is aperiodic. In an irreducible chain, all states have the same period.

Recurrence vs. Transience

Let $f_{i}$ be the probability that, starting in state $i$ , the process ever returns to $i$ .

Recurrent: $f_{i} = 1$ . The process will return to $i$ infinitely many times.
Transient: $f_{i} < 1$ . There is a non-zero probability the process never returns.

A recurrent state is Positive Recurrent if the expected time to return $E [T_{i}]$ is finite, and Null Recurrent otherwise (only possible in infinite state spaces).

5. Stationary Distribution $π$

A probability distribution $π$ (represented as a row vector) is called a stationary distribution if: $π P = π, subject to \sum_{i \in S} π_{i} = 1$

This corresponds to a left eigenvector of $P$ associated with the eigenvalue $λ = 1$ .

Fundamental Theorem

For any irreducible and aperiodic (ergodic) Markov Chain:

A unique stationary distribution $π$ exists.
$lim_{n \to \infty} P_{ij}^{(n)} = π_{j}$ for all $i, j$ .
The distribution of $X_{n}$ converges to $π$ regardless of the initial distribution.

6. Absorbing Markov Chains

A state $i$ is absorbing if $P_{ii} = 1$ . A chain is absorbing if it has at least one absorbing state and from every state it is possible to reach an absorbing state.

The transition matrix of an absorbing chain can be arranged in canonical form: $P = (Q 0 R I)$ where $Q$ represents transitions between transient states. The Fundamental Matrix $N$ is: $N = (I - Q)^{- 1} = I + Q + Q^{2} + \dots$ The entry $N_{ij}$ represents the expected number of times the process stays in transient state $j$ given it started in transient state $i$ . The expected time to absorption starting from $i$ is the $i$ -th entry of the vector $N 1$ .

7. Applications: Google’s PageRank

The PageRank algorithm models a “random surfer” on the web. Let $G$ be the hyperlink graph. The transition matrix $H$ is defined where $H_{ij} = 1/ (out-degree of i)$ if a link exists. To ensure irreducibility and aperiodicity, a damping factor $α$ (typically 0.85) is introduced: $G = α H + (1 - α) \frac{1}{n} E$ where $E$ is an all-ones matrix. The PageRank vector is the stationary distribution of $G$ .

8. Computational Implementation

Below is a Python implementation to compute the stationary distribution of a Markov Chain using two methods: solving the linear system (eigenvector) and long-run simulation.

import numpy as np
from scipy import linalg

def compute_stationary(P):
    """
    Computes the stationary distribution of a transition matrix P.
    Method 1: Eigenvector decomposition.
    Method 2: Power iteration (long-run simulation).
    """
    n = P.shape[0]
    
    # Method 1: Algebraic Solution
    # We solve pi(P - I) = 0, which is (P^T - I)pi^T = 0
    # Also we know sum(pi) = 1.
    A = np.append(P.T - np.eye(n), [np.ones(n)], axis=0)
    b = np.append(np.zeros(n), [1])
    # Use least squares to solve the overdetermined system
    pi_algebraic, _, _, _ = linalg.lstsq(A, b)
    
    # Method 2: Power Iteration
    # Repeatedly apply the transition matrix
    pi_sim = np.ones(n) / n
    for _ in range(1000):
        prev_pi = pi_sim.copy()
        pi_sim = pi_sim @ P
        if np.allclose(pi_sim, prev_pi, atol=1e-10):
            break
            
    return pi_algebraic, pi_sim

# Example: 3-state system
# 0: Sunny, 1: Cloudy, 2: Rainy
P = np.array([
    [0.7, 0.2, 0.1],
    [0.3, 0.4, 0.3],
    [0.2, 0.3, 0.5]
])

algebraic, simulated = compute_stationary(P)
print(f"Algebraic Pi: {algebraic}")
print(f"Simulated Pi: {simulated}")

Quiz

Conceptual Check

A Markov Chain is irreducible and has a state with period d=2. Which of the following is true?

Conceptual Check

For an absorbing Markov Chain with transient states Q, what does the matrix (I - Q)^{-1} represent?

Conceptual Check

Which condition is SUFFICIENT for a discrete-time Markov Chain on a finite state space to have a UNIQUE stationary distribution that it converges to from any initial state?

Markov Chains & Processes

Markov Chains & Processes

1. The Markov Property

2. The Transition Probability Matrix

Properties of Stochastic Matrices

3. The Chapman-Kolmogorov Equations

4. State Classification

Communication and Irreducibility

Periodicity

Recurrence vs. Transience

5. Stationary Distribution π

Fundamental Theorem

6. Absorbing Markov Chains

7. Applications: Google’s PageRank

8. Computational Implementation

Quiz

A Markov Chain is irreducible and has a state with period d=2. Which of the following is true?

For an absorbing Markov Chain with transient states Q, what does the matrix (I - Q)^{-1} represent?

Which condition is SUFFICIENT for a discrete-time Markov Chain on a finite state space to have a UNIQUE stationary distribution that it converges to from any initial state?

5. Stationary Distribution $π$