The Master Factorization

Linear algebra at a deep level is mostly the study of one factorization and what breaks when you leave it. The singular value decomposition says every matrix, with no exceptions, is a rotation, a nonnegative scaling along orthogonal axes, and another rotation: $A = U\Sigma V^{\top}$ . Eigenvalues need square matrices and can be complex or defective; singular values always exist, are real, and are ordered. They measure how much the map stretches space, and how many directions actually matter. Rank, norm, condition number, and best low-rank approximation all read straight off $\Sigma$ . Tensors are where this clean story ends.

The best rank- $k$ approximation of a matrix is found by keeping its $k$ largest singular values and discarding the rest. Write $A = \sum_i \sigma_i u_i v_i^{\top}$ as a sum of rank-one layers, ordered by $\sigma$ . Truncating after $k$ terms gives $A_k$ , and Eckart-Young says no other rank- $k$ matrix gets closer. The error is exactly what was thrown away: $\sigma_{k+1}$ in spectral norm, the root-sum-of-squares of the tail in Frobenius norm. Because the layers are orthogonal, removing the smallest ones costs the least possible. Compression, PCA, and denoising are all this one move.

Truncation is optimal in both norms because both depend only on the singular values, not on orientation. The spectral and Frobenius norms are unitarily invariant: multiplying by $U$ or $V$ leaves them unchanged. That collapses the approximation problem to one about $\Sigma$ alone, a diagonal matrix, where keeping the largest entries is plainly best. The spectral norm reads off the top singular value; the Frobenius norm sums their squares. Mirsky proved the result holds for every unitarily invariant norm at once, so Eckart-Young is not two coincidences but one theorem wearing different gauges.

A norm is unitarily invariant when rotating the input leaves its value fixed: $\lVert UAV \rVert = \lVert A \rVert$ for every pair of unitary $U$ and $V$ . The SVD makes the consequence immediate. Any matrix rotates into its own $\Sigma$ , so the norm sees only the singular values; the directions in $U$ and $V$ are invisible. Von Neumann pinned down the rest: every unitarily invariant norm is a symmetric gauge function of the singular value vector, meaning an ordinary vector norm blind to permutation and sign. Spectral, Frobenius, and nuclear norms are just three choices of that gauge.

Tensors break the clean story because the SVD’s guarantees were never about matrices in general, only about the two-dimensional case. With three or more indices, no single factorization orders its components, diagonalizes, and truncates optimally at once. Tensor rank is NP-hard to compute, and De Silva and Lim showed the best rank- $k$ approximation can fail to exist at all: the set of low-rank tensors is not closed, so the factors can blow up as a sequence approaches its target. CP buys uniqueness, Tucker buys orthogonality, neither buys Eckart-Young. The clean story was a two-dimensional accident.