Singular Value Decomposition (SVD)
The Singular Value Decomposition (SVD) represents the pinnacle of matrix factorizations. While the eigendecomposition is restricted to square matrices (and further limited by diagonalizability), the SVD is universal. Every linear operator (where or ) possesses an SVD, providing profound insights into the geometry and numerical stability of the transformation.
1. The SVD Theorem
Let be a matrix of rank . There exists a factorization of the form:
where:
- is a unitary matrix (). Its columns are called the left singular vectors.
- is a unitary matrix (). Its columns are the right singular vectors.
- is a rectangular diagonal matrix with non-negative real entries on the diagonal. These are the singular values.
In the real case (), and are orthogonal matrices, and the decomposition is .
2. Derivation and Link to Eigendecomposition
The SVD is intimately connected to the spectral properties of the hermitian matrices and .
- Right Singular Vectors (): Consider . Since is positive semi-definite and Hermitian, its eigenvalues are real and non-negative. Let these be . The right singular vectors are the orthonormal eigenvectors of .
- Singular Values (): The singular values are the square roots of these eigenvalues: .
- Left Singular Vectors (): These are the orthonormal eigenvectors of . For , they are uniquely determined by:
3. Geometric Intuition
Geometrically, the SVD asserts that any linear transformation can be decomposed into a rotation in the domain, a scaling along principal axes, and a rotation in the codomain.
Imagine a unit sphere in . Under the transformation , this sphere is mapped to a hyper-ellipsoid in .
- The singular values are the lengths of the semi-axes of the ellipsoid.
- The left singular vectors define the directions of these semi-axes in the codomain.
- The right singular vectors define the orthonormal basis in the domain that maps to these semi-axes.
4. Properties of Singular Values
- Non-negativity: for all .
- Ordering: Conventionally, .
- Rank: The number of non-zero singular values is exactly the rank of the matrix .
- Norms:
- Spectral Norm: .
- Frobenius Norm: .
- Condition Number: For an invertible square matrix, .
5. The Moore-Penrose Pseudoinverse
For any matrix , we define the pseudoinverse via the SVD:
where is obtained by taking the reciprocal of each non-zero element on the diagonal of .
Applications in Linear Systems: For an overdetermined system , the least-squares solution that minimizes is given by . If multiple solutions exist (underdetermined), provides the solution with the minimum Euclidean norm.
6. Low-Rank Approximation
The Eckart–Young–Mirsky Theorem provides the theoretical foundation for data compression. It states that the best rank- approximation of (where ) in the Frobenius norm is given by:
The approximation error is . This property is utilized in Principal Component Analysis (PCA) to reduce dimensionality while preserving maximum variance.
7. Polar Decomposition
The SVD enables the Polar Decomposition, analogous to the polar form of complex numbers (). Any square matrix can be written as:
where is unitary (representing rotation/reflection) and is a positive semi-definite Hermitian matrix (representing scaling/stretching).
- Using SVD: .
- Here and .
8. Python Implementation: Rank-k Approximation
The following script performs a rank- approximation on a randomly generated matrix and quantifies the relative error.
import numpy as np
def rank_k_approximation(A, k):
# Perform SVD
U, S, Vh = np.linalg.svd(A, full_matrices=False)
# Construct Sigma_k
Sk = np.diag(S[:k])
# Reconstruct rank-k matrix
Ak = U[:, :k] @ Sk @ Vh[:k, :]
# Calculate Frobenius norm error
full_norm = np.linalg.norm(A, 'fro')
error_norm = np.linalg.norm(A - Ak, 'fro')
return Ak, error_norm / full_norm
# Example usage
m, n = 100, 50
A = np.random.randn(m, n)
k = 10
Ak, rel_error = rank_k_approximation(A, k)
print(f"Original Rank: {np.linalg.matrix_rank(A)}")
print(f"Target Rank: {k}")
print(f"Relative Frobenius Error: {rel_error:.4f}")
9. Applications in Physics and Engineering
- Image Compression: By treating an image as a matrix and keeping only the top singular values, we store values instead of .
- PCA (Principal Component Analysis): In statistics, SVD is applied to the covariance matrix to find the directions of maximum variance.
- Signal Processing: SVD is used to separate signal from noise, as noise typically corresponds to small singular values.