Search Knowledge

© 2026 LIBREUNI PROJECT

Mathematics / Probability & Statistics

Statistical Inference & Estimation

Statistical inference is the process of using data analysis to deduce properties of an underlying distribution of probability. We assume that the observed data are realizations of random variables distributed according to some member of a parametric family .

1. Point Estimation

A point estimator is a statistic (a function of the data) used to approximate the unknown parameter .

Bias and Mean Squared Error (MSE)

The Bias of an estimator is defined as: An estimator is unbiased if .

The Mean Squared Error (MSE) measures the average squared difference between the estimator and the parameter: A fundamental decomposition of MSE is: This highlights the bias-variance tradeoff: as we reduce bias, variance often increases, and vice-versa.

Consistency

An estimator is consistent if it converges in probability to the true parameter: This is often denoted as . By the Law of Large Numbers, the sample mean is a consistent estimator of the population mean .

2. Maximum Likelihood Estimation (MLE)

MLE is the most widely used method for point estimation. Given i.i.d. observations, the Likelihood Function is: We seek that maximizes . In practice, it is easier to maximize the log-likelihood:

The Score Function

The Score Function is the gradient of the log-likelihood: The MLE is found by solving the likelihood equation .

Asymptotic Properties of MLE

Under “regularity conditions,” MLEs possess desirable large-sample properties:

  1. Consistency: .
  2. Asymptotic Normality: , where is the Fisher Information.
  3. Efficiency: For large , MLE achieves the Cramér-Rao Lower Bound.

3. Method of Moments (MoM)

MoM estimates parameters by equating population moments to sample moments. If , we solve the system: MoM is often easier to compute than MLE but is usually less efficient (higher variance).

4. Sufficient Statistics

A statistic is sufficient for if the conditional distribution of given does not depend on . This means captures all the information in the sample about .

Factorization Theorem (Fisher-Neyman)

is sufficient for if and only if the joint density can be factored as: where does not depend on and depends on only through .

5. Information & Efficiency

Fisher Information

The Fisher Information represents the amount of information that an observable random variable carries about an unknown parameter :

Cramér-Rao Lower Bound (CRLB)

For any unbiased estimator , its variance is bounded from below: An unbiased estimator that achieves this bound is called UMVUE (Uniformly Minimum Variance Unbiased Estimator) if it is efficient.

Rao-Blackwell Theorem

If is an unbiased estimator and is a sufficient statistic, then the conditional expectation is also unbiased and . This implies that we only ever need to search for optimal estimators among functions of sufficient statistics.

6. Interval Estimation

Instead of a single point, we construct a Confidence Interval (CI) such that: This is often done using a Pivotal Quantity , a function of the data and the parameter whose distribution does not depend on . Example: For with known , is a pivot since .

Python Implementation: MLE for Poisson Distribution

The following code calculates the MLE for the parameter of a Poisson distribution and visualizes the log-likelihood surface.

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize_scalar
from scipy.stats import poisson

# Generate synthetic Poisson data with true lambda = 4.5
np.random.seed(42)
true_lambda = 4.5
data = np.random.poisson(true_lambda, size=100)

def log_likelihood(lam, data):
    if lam <= 0: return -np.inf
    # Poisson PMF: (lam^k * e^-lam) / k!
    return np.sum(poisson.logpmf(data, lam))

# We want to maximize log-likelihood, which is minimizing the negative log-likelihood
def neg_log_likelihood(lam, data):
    return -log_likelihood(lam, data)

# Find MLE using scipy
res = minimize_scalar(neg_log_likelihood, args=(data,), bounds=(0.1, 10), method='bounded')
mle_lambda = res.x

print(f"Sample Mean: {np.mean(data):.4f}")
print(f"MLE Lambda: {mle_lambda:.4f}")

# Visualization
lam_range = np.linspace(2, 7, 100)
ll_values = [log_likelihood(l, data) for l in lam_range]

plt.figure(figsize=(10, 5))
plt.plot(lam_range, ll_values, label='Log-Likelihood', color='#2563eb', lw=2)
plt.axvline(mle_lambda, color='red', linestyle='--', label=f'MLE $\hat{{\lambda}}$ = {mle_lambda:.2f}')
plt.title('Log-Likelihood Surface for Poisson Parameter $\lambda$')
plt.xlabel('$\lambda$')
plt.ylabel('Log-Likelihood')
plt.legend()
plt.grid(alpha=0.3)
plt.show()

Advanced Concepts: Efficiency

An estimator’s performance is often compared via Relative Efficiency: If the efficiency is , is superior. As , the asymptotic efficiency of MLE is 1, meaning it is the best one can do in the limit.

Conceptual Check

According to the Factorization Theorem, what defines a sufficient statistic T(x)?

Conceptual Check

What is the significance of the Cramér-Rao Lower Bound?

Conceptual Check

Which property of MLE ensures that as sample size increases, the estimator converges to the true parameter?