Continuous Optimization: Theory, Algorithms, and Duality

Optimization represents the pinnacle of applied mathematical analysis, providing the framework for decision-making in economics, engineering, and artificial intelligence. This lesson provides a rigorous treatment of extrema in $R^{n}$ , moving from unconstrained local analysis to constrained global optimization.

1. Extremum Problems in $R^{n}$

Let $f : D \subseteq R^{n} \to R$ be a scalar field. We are interested in finding $x^{*} \in D$ such that $f (x^{*})$ is as small (or large) as possible.

Definition 1.1 (Local Extremum). A point $x^{*} \in D$ is a local minimum of $f$ if there exists an open neighborhood $U$ of $x^{*}$ such that $f (x^{*}) \leq f (x)$ for all $x \in U \cap D$ . If the inequality is strict for $x \neq = x^{*}$ , it is a strict local minimum.

Definition 1.2 (Global Extremum). $x^{*}$ is a global minimum if $f (x^{*}) \leq f (x)$ for all $x \in D$ .

The Extreme Value Theorem guarantees the existence of global extrema if $f$ is continuous and $D$ is compact (closed and bounded in $R^{n}$ ). In non-compact domains, we must analyze the behavior of $f$ as $∥ x ∥ \to \infty$ .

2. First-Order Necessary Conditions (FONC)

For a function $f$ to have a local extremum at an interior point $x^{*}$ , it must be “flat” in all directions.

Theorem 2.1 (Vanishing Gradient). If $f$ is differentiable at $x^{*} \in int (D)$ and $x^{*} \in int (D)$ is a local extremum, then: $\nabla f (x^{*}) = 0,$ where $\nabla f = [\frac{\partial f}{\partial x _{1}}, \dots, \frac{\partial f}{\partial x _{n}}]^{⊤}$ .

Proof Sketch: Consider the one-dimensional functions $g_{i} (t) = f (x^{*} + t e_{i})$ , where $e_{i}$ is the $i$ -th standard basis vector. Since $x^{*}$ is a local extremum of $f$ , $t = 0$ must be a local extremum of each $g_{i} (t)$ . By single-variable calculus, $g_{i}^{'} (0) = 0$ . Since $g_{i}^{'} (0) = \frac{\partial f}{\partial x _{i}} (x^{*})$ , all partial derivatives must vanish.

Points where $\nabla f (x^{*}) = 0$ are called stationary or critical points. Note that this condition is necessary but not sufficient (e.g., $f (x, y) = x^{2} - y^{2}$ at $(0, 0)$ ).

3. Second-Order Conditions and Stability

To determine if a critical point is a minimum, maximum, or saddle point, we analyze the curvature using the Hessian matrix $H (x) \in R^{n \times n}$ .

Definition 3.1 (The Hessian).

H (x) = \nabla^{2} f (x) = \frac{\partial ^{2} f}{\partial x _{1}^{2}} ⋮ \frac{\partial ^{2} f}{\partial x _{n} \partial x _{1}} \dots ⋱ \dots \frac{\partial ^{2} f}{\partial x _{1} \partial x _{n}} ⋮ \frac{\partial ^{2} f}{\partial x _{n}^{2}} .

Theorem 3.2 (Second-Order Sufficiency). Let $\nabla f (x^{*}) = 0$ for a $C^{2}$ function $f$ .

If $H (x^{*})$ is positive definite ( $v^{⊤} H v > 0, \forall v \neq = 0$ ), $x^{*}$ is a strict local minimum.
If $H (x^{*})$ is negative definite ( $v^{⊤} H v < 0, \forall v \neq = 0$ ), $x^{*}$ is a strict local maximum.
If $H (x^{*})$ is indefinite (has both positive and negative eigenvalues), $x^{*}$ is a saddle point.

4. Convexity: The Bridge to Global Optimality

Convexity is arguably more important than differentiability in modern optimization.

Definition 4.1 (Convex Set). A set $C \subseteq R^{n}$ is convex if for any $x, y \in C$ , the line segment ${θ x + (1 - θ) y : θ \in [0, 1]}$ is contained in $C$ .

Definition 4.2 (Convex Function). $f : C \to R$ is convex if $f (θ x + (1 - θ) y) \leq θ f (x) + (1 - θ) f (y)$ for all $x, y \in C, θ \in [0, 1]$ .

Fundamental Property: If $f$ is a convex function on a convex set $C$ , then any local minimum is a global minimum. This property makes convex optimization problems (like Least Squares or Support Vector Machines) computationally tractable and allows for efficient numerical solvers.

5. Equality Constrained Optimization: Lagrange Multipliers

Consider the problem: $min_{x \in R^{n}} f (x) subject to g_{i} (x) = 0, i = 1, \dots, m .$

Theorem 5.1 (Lagrange’s Theorem). Let $x^{*}$ be a local extremum and a regular point of the constraints (meaning ${\nabla g_{i} (x^{*})}_{i = 1}^{m}$ are linearly independent). Then there exists a vector $λ^{*} \in R^{m}$ of Lagrange Multipliers such that: $\nabla f (x^{*}) + \sum_{i = 1}^{m} λ_{i}^{*} \nabla g_{i} (x^{*}) = 0 .$

We define the Lagrangian function: $L (x, λ) = f (x) + \sum_{i = 1}^{m} λ_{i} g_{i} (x) .$ The necessary conditions for optimality are given by the system $\nabla L (x, λ) = 0$ .

Geometric Intuition: At the optimum, the gradient of the objective $\nabla f$ must be orthogonal to the tangent space of the constraint manifold. Since the gradients of the constraints $\nabla g_{i}$ span the normal space at $x^{*}$ , $\nabla f$ must be expressible as a linear combination of these normal vectors.

6. Inequality Constraints and KKT Conditions

When constraints include inequalities ( $h_{j} (x) \leq 0$ ), we extend Lagrange’s method to the Karush-Kuhn-Tucker (KKT) conditions.

For $x^{*}$ to be a local minimum, under a constraint qualification, there must exist $λ^{*}$ and $μ^{*}$ such that:

Stationarity: $\nabla f (x^{*}) + \sum λ_{i}^{*} \nabla g_{i} (x^{*}) + \sum μ_{j}^{*} \nabla h_{j} (x^{*}) = 0$
Primal Feasibility: $g_{i} (x^{*}) = 0, h_{j} (x^{*}) \leq 0$
Dual Feasibility: $μ_{j}^{*} \geq 0$
Complementary Slackness: $μ_{j}^{*} h_{j} (x^{*}) = 0$

Complementary slackness is the most critical KKT condition: it implies that if a constraint is “slack” ( $h_{j} (x^{*}) < 0$ ), its multiplier must be zero ( $μ_{j} = 0$ ). If $μ_{j} > 0$ , the constraint must be “active” ( $h_{j} (x^{*}) = 0$ ).

7. Sensitivity and Shadow Prices

The multipliers provide a measure of the sensitivity of the optimal value to perturbations in the constraints. Suppose we relax a constraint to $g_{i} (x) = ϵ$ . Let $V (ϵ)$ be the optimal objective value.

Theorem 7.1. Under regularity: $\frac{\partial V}{\partial ϵ _{i}} = - λ_{i}$ In economics, $λ_{i}$ is interpreted as the shadow price. It represents the marginal change in the objective function if the $i$ -th constraint is relaxed by one unit. This is vital for resource allocation problems where one must decide whether the cost of increasing a resource is justified by the resulting improvement in the objective.

8. Computational Example: Constrained Optimization in Python

The following code uses scipy.optimize to solve a quadratic objective with a non-linear inequality constraint.

import numpy as np
from scipy.optimize import minimize

# Objective: Minimize f(x, y) = x^2 + y^2
def objective(x):
    return x[0]**2 + x[1]**2

# Constraint: x + y >= 1  => x + y - 1 >= 0
def constraint1(x):
    return x[0] + x[1] - 1

# Initial guess
x0 = [2, 2]

# Define the constraint dictionary
con1 = {'type': 'ineq', 'fun': constraint1}
cons = [con1]

# Perform optimization
sol = minimize(objective, x0, method='SLSQP', constraints=cons)

print(f"Optimal solution: x = {sol.x[0]:.4f}, y = {sol.x[1]:.4f}")
print(f"Minimum value: f(x,y) = {sol.fun:.4f}")

9. Advanced Quiz

Conceptual Check

Which of the following describes the Second-Order Sufficient Condition for a strict local minimum?

Conceptual Check

In KKT theory, what does 'Complementary Slackness' imply?

Conceptual Check

Optimization & Lagrange Multipliers

Continuous Optimization: Theory, Algorithms, and Duality

1. Extremum Problems in $R^{n}$

2. First-Order Necessary Conditions (FONC)

3. Second-Order Conditions and Stability

4. Convexity: The Bridge to Global Optimality

5. Equality Constrained Optimization: Lagrange Multipliers

6. Inequality Constraints and KKT Conditions

7. Sensitivity and Shadow Prices

8. Computational Example: Constrained Optimization in Python

9. Advanced Quiz

Which of the following describes the Second-Order Sufficient Condition for a strict local minimum?

In KKT theory, what does 'Complementary Slackness' imply?

If a Lagrange multiplier \lambda for a 'budget' constraint g(x) = B is 10, what is the best interpretation?

Optimization & Lagrange Multipliers

Continuous Optimization: Theory, Algorithms, and Duality

1. Extremum Problems in Rn

2. First-Order Necessary Conditions (FONC)

3. Second-Order Conditions and Stability

4. Convexity: The Bridge to Global Optimality

5. Equality Constrained Optimization: Lagrange Multipliers

6. Inequality Constraints and KKT Conditions

7. Sensitivity and Shadow Prices

8. Computational Example: Constrained Optimization in Python

9. Advanced Quiz

Which of the following describes the Second-Order Sufficient Condition for a strict local minimum?

In KKT theory, what does 'Complementary Slackness' imply?

If a Lagrange multiplier \lambda for a 'budget' constraint g(x) = B is 10, what is the best interpretation?

1. Extremum Problems in $R^{n}$