Search Knowledge

© 2026 LIBREUNI PROJECT

Mathematics / Probability & Statistics

Hypothesis Testing & Power

Hypothesis Testing & Power

In statistical inference, hypothesis testing is the formal process of using data to evaluate the validity of a claim about a population parameter. This lesson moves beyond introductory “plug-and-chug” methods to explore the mathematical foundations of decision theory, likelihood ratios, and the optimization of test power.

1. The Decision Framework: and

We define two competing hypotheses:

  • Null Hypothesis (): The status quo or a specific “no effect” state. Mathematically, it typically specifies a subset of the parameter space .
  • Alternative Hypothesis (): The statement we seek to find evidence for, .

A test is a decision rule that maps the sample space to the set . This is often defined via a rejection region :

2. Errors in Decision Making

Errors are unavoidable in frequentist inference. We quantify them as probabilities:

  • Type I Error (): Rejecting when it is true.

  • Type II Error (): Failing to reject when it is false.

  • Power of the Test (): The probability of correctly rejecting a false null hypothesis.

Ideally, we minimize both and . However, for a fixed sample size , there is an inverse relationship: decreasing the “size” () of the test generally increases .

3. Test Statistics and Rejection Regions

A test statistic reduces the dimensionality of the data to a single value used for the decision. Common forms include:

The Z-Test (Known Variance)

Under with known variance :

The T-Test (Unknown Variance)

If is unknown and estimated by sample variance :

The Chi-Squared Test

For variance testing or goodness-of-fit:

4. The P-Value: A Measure of Evidence

The p-value is not the probability that is true. Rather, it is the probability of observing a test statistic at least as extreme as the one computed, assuming is true.

Formally, for a test statistic where large values provide evidence against :

A p-value is a random variable itself. If is true, the p-value is uniformly distributed on for continuous test statistics: .

5. The Neyman-Pearson Lemma

How do we choose the best rejection region ? For a simple null versus a simple alternative , the Neyman-Pearson Lemma provides the Most Powerful (MP) test.

The lemma states that the region that maximizes power for a fixed is defined by the Likelihood Ratio: where is chosen such that . This ratio ensures that we reject when the data is significantly “more likely” under than under .

6. Uniformly Most Powerful (UMP) Tests

When is composite (e.g., ), we seek a test that is the most powerful for all . Such a test is called Uniformly Most Powerful (UMP).

The existence of a UMP test is guaranteed if the family of distributions possesses the Monotone Likelihood Ratio (MLR) property. A family has MLR in if for any , the ratio is a non-decreasing function of .

7. Likelihood Ratio Tests (LRT) and Wilks’ Theorem

For complex, multi-parameter composite hypotheses, we use the generalized Likelihood Ratio Test: where . Small values of lead to rejection.

Wilks’ Theorem: Under certain regularity conditions, as , the distribution of converges in distribution to a distribution with degrees of freedom equal to the difference in dimensionality between and :

8. Multiple Testing and Bonferroni Correction

When conducting independent tests at a significance level , the probability of committing at least one Type I error (Family-Wide Error Rate, FWER) is . As grows, this approaches 1.

The Bonferroni correction guards against this by using a stricter threshold for each individual test:


Python Implementation: T-Test and Visualization

The following code calculates a one-sample t-test and visualizes the rejection region vs. the p-value.

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# Parameters
mu_null = 50
sample_size = 30
data = np.random.normal(52, 10, sample_size) # Mean 52, StdDev 10

# Perform t-test
t_stat, p_val = stats.ttest_1samp(data, mu_null)
df = sample_size - 1

# Plotting
x = np.linspace(-4, 4, 1000)
y = stats.t.pdf(x, df)
critical_value = stats.t.ppf(0.95, df)

plt.figure(figsize=(10, 6))
plt.plot(x, y, label=f't-distribution (df={df})')

# Rejection Region (Alpha = 0.05)
plt.fill_between(x, 0, y, where=(x > critical_value), color='red', alpha=0.3, label='Rejection Region')

# P-value area
plt.fill_between(x, 0, y, where=(x > t_stat), color='blue', alpha=0.5, label=f'p-value area (p={p_val:.4f})')

plt.axvline(t_stat, color='black', linestyle='--', label=f'Observed t={t_stat:.2f}')
plt.title('One-Sample T-Test: Rejection Region vs P-value')
plt.legend()
plt.show()
Conceptual Check

According to the Neyman-Pearson Lemma, what defines the rejection region for the Most Powerful test between two simple hypotheses?

Conceptual Check

What is the distribution of the p-value when the Null Hypothesis is true for a continuous test statistic?

Conceptual Check

Which theorem provides the asymptotic distribution for the Likelihood Ratio Test statistic $-2 \\ln \\lambda$?