Matrix Calculus
by Heinrich Hartmann / 2025-06-23 / Stemwede
Introduction
Good notation has vast influence on how well we are able to perform computations, and how much we can trust the results. A good "calculus" allows us to comprehend formulas and spot inconsistencies easily.
In this text we are following the general "function space" approach laid out in Natural Operators, but focus on matrix algebra suited for computation.
Vectors \(a\) are defined as functions on a (finite) set \(X\), with entries \(a[x], x\in X\). Matrices are defined as functions on products \(X \times Y\). Matrix & Vector operations are defined as "tensor" contractions of adjacent indices.
In this way we arrive at a calculus that makes the following properties clear:
- Explicit index computations \(i=1,\dots,N\) are naturally discouraged, since we are dealing with sets and not numbers as indices.
- Results and formulas specialize to numerical formulas by taking \(X = [N] = \{ 0, \dots, N-1 \}\). Here the notation \(a[x]\) is consistent with Python's indexing of arrays/lists.
- The concepts of rows and column vectors are avoided. Vectors are one-dimensional structures, and can be multiplied from the left and right (by contracting adjacent indices). Embeddings of vectors into matrix spaces are realized as outer products: \(1 \tensor a \in \IR[1,X]\).
Function Spaces
Let \(X\) be a set. Typical examples are \(X=[N]=\{1,\dots,N\}\) if we are doing computations, but also infinite sets like \(X=\IR\). We introduce the following notation:
- \(\IR[X]= (\stext{real valued functions on X}) = Map(X, \IR)\).
- For \(a\in \IR[X], x\in X\), we denote by \(a[x]\in \IR\) the value of \(a\) at \(x\).
- For \(x \in X\) we denote by \(\hat{x} \in \IR[X]\) the "delta" function with \(\hat{x}(x)=1\), and \(\hat{x}(y) = 0\) for all \(y\neq x\).
- \(\IR[X]\) is a commutative, associative, unital algebra with pointwise operations:
- \((a + b)[x] = a[x]+b[x] \in \IR\)
- \((a \cdot b)[x] = a[x]b[x] \in \IR\)
- The scalar multiplication by \(r \in \IR\) is denoted by \(r a\). We sometimes write \(r.a\) if we want to stress the scalar nature of \(r\).
Notes on Notation
- We choose the variables \(x \in X\) to denote points. \(x\) is a non-linear discrete entity without further structure. It may also be an index, \(x \in [N]\).
In other texts, we would see: \(i = 1,\dots,N\).
- For functions \(a \in \IR[X]\) we choose the variable \(a\). Here \(a\) is an algebraic entity,
with a rich set of operations, inherited from the real numbers. In general, functions are
best denoted by letters which are similar to the target space. We parametrize values in the target \(a[x]\) by \(x \in X\).
In other texts, we would see \(v\) with \(v_i, i = 1,\dots, N\).
- The notation \(a[x]\) is inspired by Python array indexing. If we choose \(X = \{ 0,\dots,N-1\}\) we exactly recover python-style array indexing \(a[i]\).
Slicing
For \(A \in \IR[X,Y], x\in X,y\in Y\) we define.
- \(A[x,:] \in \IR[Y]\) by \(A[x,:][y] = A[x,y]\).
- \(A[:,y] \in \IR[X]\) by \(A[:,y][x] = A[x,y]\).
Inner Products
Assume \(X\) is finite, and \(a,b\) in \(\IR[X]\).
We define the inner product as
- Commutativity We have \(a * b = b * a\).
- Reconstruction We have \(a = \sum_x a[x] \hat{x} = \sum_x (a * \hat{x})\hat{x}\)
Side-note The inner product is also defined if \(X\) is infinite but either \(a\) or \(b\) have finite support, i.e. \(a[x]=0\) for almost all \(x\).
Matrix Products
Let \(X,Y,Z\) be finite sets, \(x \in X,y \in Y\), \(a \in \IR[X], b \in \IR[Y], c \in \IR[Z]\).
- We define \(\IR[X,Y]=\IR[ X\times Y]\) as real valued functions on \(X\times Y\).
- For \(A \in \IR[X,Y]\) we denote the entries by \(A[x,y] \in \IR\).
- We define the vector-matrix product \(a * A \in \IR[Y]\) by \(a * A[y] = \sum_x a[x]A[x,y].\)
- We define the matrix-vector product \(A * b \in \IR[X]\) by \(A * b[x] = \sum_y A[x,y]b[y].\)
- We define the matrix-matrix product \(A * B\in \IR[X,Z]\) by \(A*B[x,z]=\sum_y A[x,y]B[y,z]\), where \(B \in \IR[Y,Z]\).
Note
- There are no row and column vectors here. Matrix-vector products are defined in both directions: \(a * A\) and \(A * b\). This is perfectly natural from an indexing standpoint, as "*" is defined as a "contraction operator" that sums over adjacent indices. This extends naturally to n-dimensional arrays.
- Traditionally \(v * A\) is denoted as \(v^T \cdot A\) and \(A * v\) is \(A \cdot v\), where \(v^T\) is a 1xN matrix "row matrix" and \(v\) an Nx1 "column matrix".
Associativity
- The matrix product is associative: \(A * (B * C) = (A * B) * C\), hence we write \(A * B * C\) without ambiguity.
- The matrix product is associative with vector-matrix product: \(a * (A * B) = (a * A) * B\), hence we write \(a * A * B\) without ambiguity.
- The vector-matrix product is associative with the scalar product: \((a * A) * b = a * (A * b)\), hence we write \(a * A * b\) without ambiguity.
Note that, since the scalar product is also commutative we have:
It's important to note, that in this case we can't remove the brackets, since "\((b * a) . A\)" is something entirely different: It's the scalar \(b * a \in \IR\) multipled with the matrix \(A\).
Unit
- The identity matrix \(I_X \in \IR[X,X]\) is given by: \(I_X[x,y] = \delta_{x,y}\), where \(\delta_{x,x} = 1\) and \(\delta_{x,y} = 0\) for \(x \neq y \in X\).
- For any vector \(a \in \IR[X]\) we have \(a * I_X = I_X * a = a\).
- For any matrix \(A \in \IR[X,Y]\), \(B \in \IR[Y,X]\) we have: \(I_X * A = A\) and \(B * I_X = B\).
Transposition
- We define \(A^* \in \IR[Y,X]\) as \(A^*[y,x] = A[x,y]\).
- We have \(A^* * a = a * A\)
In this text we will not use transposition of vectors / 1d arrays,
Outer Products
For \(a \in \IR[X]\) and \(b \in \IR[Y]\), define \(a \tensor b \in \IR[X,Y]\) as \((a \tensor b)[x,y] = a[x]b[y]\).
Properties
- For \(A \in \IR[W,X]\) and \(B \in \IR[Y,Z]\) we have: \(A * (a \tensor b) * B = (A * a) \tensor (b * B)\).
- The matrix \(a \tensor b\) acts on vectors \(a' \in \IR[X], b' \in \IR[Y]\) as \(a' * (a \tensor b) * b' = (a' * a) . (b * b')\).
- (Row- and Column-vectors) We can convert vectors into matrices via \(1 \tensor a \in \IR[1,X]\) and \(a \tensor 1 \in \IR[X,1]\).
- For \(x \in X\) the vector \(\hat{x} \tensor b = \hat{x} * (1 \tensor b)\) is the "row vector" centered at \(x\).
- For \(y \in X\) the vector \(a \tensor \hat{y} = (a \tensor 1) * \hat{y}\) is the "column vector" centered at \(y\).
- The identity matrix, can be expressed as \(I_X = \sum_{x\in X} \hat{x} \tensor \hat{x}\), for \(X\) finite.
- The identity \(I_X * A = A\) recovers \(A\) as sum of its "rows" $$ A = I_X * A = \sum_{x\in X} \hat{x} \tensor (\hat{x} * A) = \sum_{x\in X} \hat{x} \tensor A[x,:] $$
- The identity \(A * I_Y\) recovers \(A\) as sum of its "columns" $$ A = A * I_Y = \sum_{y\in Y} (A * \hat{y}) \tensor \hat{y} = \sum_{y\in Y} A[:,y] \tensor \hat{y}. $$
Proposition
- The matrix \(a \tensor b\) has rank one.
- Every rank one matrix \(R \in \IR[X,Y]\) is of the form \(a \tensor b\).
Remark
- In classical treatments, vectors \(a \in \IR[X]\) are identified with "column vectors" in \(\IR[X,1]\), which we denote by \(a \tensor 1\).
- The classical scalar product \(v^t \cdot w\), is expressed as \((1 \tensor v) * (w \tensor 1) = v * w \in \IR\).
- The classical rank-1 matrix \(v . w^t\), is expressed as \((v \tensor 1) * (1 \tensor w) = v \tensor w \in \IR[X,Y]\).
- The product \(\tensor\) generalizes naturally to the Kronecker Product or Tensor Product for higher order-tensors \(\IR[X_1, \dots, X_n]\).
Orthonormal Systems
- An orthonormal system \(u_z \in \IR[X], z \in Z\) is a set of vectors for which \(u_z * u_w = \delta_{z,w}\) for \(z,w \in Z\).
- The vectors \(u_z \in \IR[X], z\in Z\) form an orthonormal system if and only if the matrix \(U = \sum_z u_z \tensor \hat{z} \in \IR[X,Z]\) fulfills \(U^* U = I_Z\).
- An orthonormal basis (ONB) \(u_x \in \IR[X], x \in X\) is an orthonormal system in \(\IR[X]\) indexed by \(X\).
Proposition
- Orthonormal systems are linearly independent in \(\IR[X]\).
- Any orthonormal system \(u_z \in \IR[X]\), with \(Z \subset X\) can be extended to an orthonormal basis.
- The reverse composition \(P = U * U^* \in \IR[X,X]\) and \(Q = I_X - P\) are orthogonal projection matrices:
- Projection property: \(P * P = P, \quad Q*Q = Q\).
- Orthogonality: \(P * Q = 0, \quad Q * P = 0\).
- \(Im(P) = Im(U) = Span(u_z, z\in Z) = Ker(Q)\)
- \(Im(Q) = Im(P)^\perp\)
Linear Analysis
We repeat some of the results from the Analysis of Linear Functions post.
- For a real-valued differentiable function \(f: \IR[N] \lra \IR\) define the gradient \(d f(p) \in \IR[N]\) at a point \(p \in \IR^N\) as $$ d f(p)[n] = \frac{\del}{\del t}|_{t=0} f(p + t\hat{n}). $$
- For a vector-valued differentiable function \(f: \IR[N] \lra \IR[M]\), with components \(f^m: \IR^N \lra \IR\) and a point \(p \in \IR^N\) we define the Jacobi Matrix \((Df)(p) \in \IR[M,N]\) as \(Df(p) = \sum_m \hat{m} \tensor df^m(p)\) so that: $$ Df(p)[m,n] = \frac{\del}{\del t}|_{t=0} f^m(p + t \hat{n}). $$
- For a real-valued differentiable function \(f: \IR[N] \lra \IR\) define the Hessian \(Hf(p) \in \IR[N,N]\) at a point \(p \in \IR^N\) as \(Hf(p) = Ddf(p)\), so that: $$ Hf(p)[n,m] = \frac{\del}{\del s}\frac{\del}{\del t}|_{t=0,s=0} f(p + t\hat{n} + s \hat{m}). $$
Calculus
Let \(a, b\) are two real valued functions on \(\IR[M]\), \(\lambda \in \IR\). Let \(c\) be a real valued functions on \(\IR[M]\). Let \(f: \IR[M] \lra \IR[N]\), \(g: \IR[L] \lra \IR[M]\), and pick \(p \in \IR[M]\), \(q\in \IR[L]\) with \(g(q) = p\).
We ommit the points from the notation and write \(Df = D_p(f), df = df(p)\).
- Linearity $$ d (a + \lambda b) = da + \lambda db \quad \in \IR[M]. $$
- Chain Rule $$ D_q (f \circ g) = (D_p f) * (D_q g) \quad \in \IR[L,N]. $$
- Chain Rule (1-dim) $$ d(a \circ g)(q) = d a (p) * D_q g \quad \in \IR[M]. $$
- Product Rule $$ d(a \cdot b) = (da).b + a.(db) \quad \in \IR[N] $$.
- Product Rule (N-dim) $$ D (f \cdot c) = D f . c + f \tensor d c \quad \in \IR[M,N] $$.
- Quotient Rule $$ d(a/b) = (da \cdot b - a \cdot db)/b^2 \quad \in \IR[N] $$.
- Quotient Rule (N-dim) $$ D(f/c) = (Df . c - f \tensor dc)/c^2 \quad \in \IR[M,N] $$.
- Power rule \(\quad d(a^n) = n a^{n-1} da\).
- Inverse Rule \(\quad d(1/a) = - da / a^2\).
Examples I
- Let \(f(p) = a * p\), for some fixed \(a \in \IR[M]\) we have \(df (p) = a\) and \(Hf(p) = 0\).
- Let \(f(p) = A * p\), for some \(A \in \IR[N,M]\) we have \(Df (p) = A\).
- Let \(f(p) = p*p\), then \(df (p) = 2 p\) and \(Hf(p) = 2I.\)
- Let \(f(p) = p * G * p\), then \(df (p) = G*p + p*G\) and \(Hf(p) = G + G^*\)
- Let \(f(p) = p * A^* * A * p\), then \(df (p) = 2 A^* * A * p\) and \(Hf(p) = 2 A^* * A\)
- Let \(f(p) = \| A*p - t \|^2\), then \(d f(p) = 2A^* * A * p - 2A^* * t\) and \(Hf(p) = 2 A^* * A\)
Examples II
- Let \(n(p) = \| p \|\), \(p \neq 0\) then $$ dn (p) = p / | p |. $$ Note, that \(dn(p)\) is the unit vector in direction \(p\).
- Let \(u(p) = p/\|p\|\), \(p \neq 0\) be the unit vector in direction \(p\).
We have \(u(p) = p / n(p)\), so \(Du(p) = (I_N . n(p) - p \tensor dn(p)) / n(p)^2\) which simplifies to: $$ Du(p) = \frac{1}{| p |} (I_N - u(p) \tensor u(p) ) \in \IR[N,N]. $$ Note, that \(u(p) \tensor u(p)\) is the orthogonal projector to the 1-d space \(u(p)\). Therefore, \(I - u(p) \tensor u(p)\) projects onto \(u(p)^\perp\).
Proposition (Rayleigh Quotient)
Let \(r(p) = p * A * p / (p * p) \in \IR\), \(p\neq 0\) for a symmetric matrix \(A \in \IR[N,N], A^* = A\).
- The derivative is given by: $$ dr(p) = \frac{2}{p*p} (A - r(p) I_N) * p \in \IR[N]. $$
- At a critical point \(p\) (\(dr(p) = 0\)), the Hessian is given by: $$ Hr(p) = \frac{2}{ p * p} (A - r(p) I_N) \in \IR[N,N]. $$
- If \(p\) is an eigenvector for \(A\) with eigenvalue \(\lambda\) then \(r(p) = \lambda\) and \(dr(p) = 0\).
- If \(p\) is a critical point for \(r\) (\(dr(p) = 0\)) then \(p\) is an eigenvector for \(A\) for eigenvalue \(r(p)\).
Spectral Theory
Spectral Decomposition
Theorem — Spectral Decomposition (Vector Form)
Let \(A \in \IR[X,X]\) be a symmetric matrix (\(A^* = A\)). Then there exists an orthonormal system \(u_z \in \IR[X], z \in Z\) and real numbers \(\lambda_z \in \IR\) so that: $$ A = \sum_{z \in Z} \lambda_z . u_z \tensor u_z $$
- Each \(u_z\) is an eigenvector of \(A\) for eigenvalue \(\lambda_z\): \(A * u_z = \lambda_z u_z\).
- The eigenvalues \(\lambda_x\) are uniquely determined by \(A\) up to ordering.
Proof
We prove by induction on the dimension of the space.
If \(\#X=1\) is trivial since every \(1 \times 1\) has the form \(a \hat{x} \tensor \hat{x}\).
For the inductive step, assume the theorem holds for all symmetric matrices of dimension \(n-1\).
The function \(f(a) = a * A * a \in \IR\) is continuous on the compact set \(S = \{a \in \mathbb{R}^n : a * a - 1 = 0\}\) (the unit sphere). By the extreme value theorem, \(f\) attains its maximum on \(S\) at some point \(u \in S\).
We apply the Lagrange multiplier criteria for maximizing \(f(a) = a * A * a\) subject to the constraint \(g(a) = a * a - 1 = 0\): At the maximum \(u\), we have \(df(a) = \lambda dg(a)\). The derivatives are given by \(df(a) = 2 A*a\) (here the assumption \(A^* = A\) enters) and \(dg(a) = 2 a\). Therefore, at the maximum point \(u\), we have \(A u = \lambda u\).
If \(\lambda = 0\), then \(f(a) = 0\) for all \(a\) and hence \(A = 0\) has already the desired form. Hence we can assume \(\lambda \neq 0\) from here on.
Consider the orthogonal complement \(u^\perp = \{a \in \IR[X] : a * u = 0\}\). For \(a \in V^\perp\), we find \(a * A * u = a * \lambda * u = 0\), hence \(A * a \in u^\perp.\) and \(A\) restricts to a linear operator \(A_u: u^\perp \ra u^\perp\).
Now \(dim(u^\perp) = \# X - 1\), so we find a representation: \(A_u = \sum_{x \in Z'} \lambda_x u_x \tensor u_x\) indexed by a set \(Z'\) of cardinality \(\# Z' = \# X - 1\). Extending \(Z'\) by a new symbol \(*\) to \(Z\), and setting \(u_* = u, \lambda_* = \lambda\) we get:
The following equivalent formulation is often used in practice.
Theorem — Spectral Decomposition (Matrix Form)
Let \(A \in \IR[X,X]\) be any symmetric matrix.
Then \(A\) can be factored as
$$
A = U * D * U^*
$$
where
- \(U \in \IR[X,X]\) is an orthogonal transformation: \(U^* * U = U * U^* = I_X\).
- \(D \in \IR[X,X]\) is diagonal: \(D = \sum_x \lambda_x . \hat{x} \tensor \hat{x}\).
Theorem - Abstract Spectral Decomposition
Let \(A \in \IR[X,X]\) be any symmetric matrix. Let \(A_\lambda = Ker(A - \lambda I_X)\) be the eigenspace of \(A\) for eigenvalue \(\lambda\).
Then
- Only finitely many \(A_\lambda\) are non-zero.
- \(\IR[X]\) decomposes into an orthogonal direct sum \(\bigoplus_\lambda K_\lambda\).
- Let \(P_\lambda \in \IR[X,X]\) be the orthogonal projector with image \(A_\lambda\), then: $$ A = \sum_{\lambda \in \IR} \lambda . P_\lambda $$
- The projectors are mutually orthogonal: \(P_\lambda * P_\mu = \delta_{\lambda, \mu} P_\lambda.\)
- The identity matrix decomposes as \(I_X = \sum_\lambda P_\lambda\)
Theorem — Sylvester Formula for Spectral Projectors
Let \(A \in \IR[X,X]\) be a symmetric matrix with distinct eigenvalues \(\lambda_1, \dots, \lambda_m \in \IR\).
Then the orthogonal projector onto the eigenspace for \(\lambda_i\) is given by:
$$
P_{\lambda_i} = \prod_{\substack{j=1 \ j \neq i}}^m \frac{A - \lambda_j I_X}{\lambda_i - \lambda_j}.
$$
This is known as Sylvester's formula (or Lagrange interpolation for matrices).
See e.g. Sylvester's formula — Wikipedia or Horn & Johnson (1985), Matrix Analysis, Section 2.5.
Singular Value Decomposition (SVD)
Theorem - SVD Vector Form
Let \(A \in \IR[Y,X]\) be any matrix. Then there is an index set \(Z\) and orthonormal systems \(u_z \in \IR[X], z \in Z\) and \(v_z \in \IR[Y], z\in Z\) so that: $$ A = \sum_{z \in Z} \sigma_z . v_z \tensor u_z $$ Here \(\sigma_z \in \IR, z\in Z\), are positive numbers \(\sigma_z > 0\), called the singular values of \(A\).
- The singular values are uniquely determined by \(A\) up to re-ordering.
- The size of the index set \(\# Z\) is equal to the rank of \(A\).
Proof
Existence
- The matrix \(K = A^* * A \in \IR[X,X]\) is symmetric and positive semi-definite.
- By the spectral theorem, we can write \(K\) we find an orthogonal matrix \(U \in \IR[X,X]\), so that \(K = \sum_{x \in X} \lambda_x . u_x \tensor u_x\).
- Since \(K\) is positive semidefinite, we have \(u_x * K * u_x = \lambda_x \geq 0\).
- Let \(Z \subset X\) be the subset of \(z \in X\) with \(\lambda_z > 0\). With this notation we have \(K = \sum_{z \in Z} \lambda_{z} . u_{z} \tensor u_z.\)
- For \(z \in Z\), let \(\sigma_z = \sqrt{\lambda_z}\), and set \(v_z = A * u_z / \sigma_z \in \IR[Y]\).
This is an ortho-normal system in \(\IR[Y]\) since for \(z,w \in Z\) we have \(v_z * v_w = u_{z} * A^* * A * u_{w} / (\sigma_z \sigma_w) = \delta_{z,w}\).
Now \(A = \sum_{z \in Z} \sigma_z v_z \tensor u_z\), since the values on the basis \(u_x, x \in X\) agree:
- For \(x \in Z \subset X\), we find \(\sum_{z \in Z} \sigma_z v_z \tensor u_z * u_x = \sigma_x v_x = A * u_x\)
- For \(x \in Z, x \notin Z\), we find \(\sum_{z \in Z} \sigma_z v_z \tensor u_z * u_x = 0\), and \(\| A * u_x \| = u_x *K *u_x = 0\), so \(A u_x = 0\).
Here are two more equivalent versions of the SVD theorem:
Theorem - Reduced Matrix SVD
Let \(A \in \IR[Y,X]\) be any matrix. \(A\) can be factored as
where
- \(V \in \IR[Y,Z]\) is an orthonormal system in \(\IR[Y]\): \(V^* * V = I_Z\).
- \(U^* \in \IR[X,Z]\) is an orthonormal systems in \(\IR[X]\) \(U * U^* = I_Z\).
- \(\Sigma \in \IR[Z,Z]\) is a diagonal matrix \(\Sigma = \sum_z \sigma_z . \hat{z} \tensor \hat{z}\).
- \(rk(U)=rk(V)=rk(\Sigma)=rk(A) = \# Z\).
Theorem - Full Matrix SVD
Let \(A \in \IR[Y,X]\) be any matrix. \(A\) can be factored as
where
- \(V \in \IR[Y,Y]\) is an orthonormal transformation: \(V^* * V = V * V^* = I_Y\).
- \(U \in \IR[X,X]\) is an orthonormal transformation: \(U^* * U = U * U^* = I_X\).
- \(\Sigma\) is a (non-square) diagonal matrix: \(\Sigma = \sum_z \sigma_z y(z)\hat\ \tensor x(z)\hat\ ,\) for a pair of embeddings \(x:Z\ra X\), \(y:Z\ra Y\).
Relationship between SVD and SD
Proposition
Let \(A \in \IR[Y,X]\) be an arbitrary matrix with singular value decomposition \(A = \sum_{z \in Z} \sigma_z v_z \tensor u_z\).
Then:
- \(A^* * A = \sum_{z \in Z} \sigma_z^2 \, u_z \tensor u_z\) is a spectral decomposition of \(A^* * A \in \IR[X,X]\).
- \(A * A^* = \sum_{z \in Z} \sigma_z^2 \, v_z \tensor v_z\) is a spectral decomposition of \(A * A^* \in \IR[Y,Y]\).
Proposition
Let \(A \in \IR[Y,X]\) be an arbitrary matrix, \(K = A^* * A \in \IR[X,X]\) and \(L = A * A^* \in \IR[Y,Y]\).
Let \(\IR[X] = \bigoplus_\lambda K_\lambda\) and \(\IR[Y] = \bigoplus_\lambda L_\lambda\) the abstract spectral decompositions into eigenspaces.
Then:
- \(A(K_\lambda) \subseteq L_\lambda\), hence we get an induced map as restriction which we denote by \(A_\lambda: K_\lambda \ra L_\lambda\).
- \(A\) decomposes as \(A = \sum_\lambda A_\lambda\)
- \(A_\lambda^* A_\lambda = \lambda I\) on \(K_\lambda\) and \(A_\lambda A_\lambda^* = \lambda I\) on \(L_\lambda\)
- If \(\lambda > 0\) then \(A_\lambda\) is an isomorphism.
- For \(\lambda = 0\) we have \(A(K_0)=0\) and \(A^*(L_0)=0\).
Pseudoinverse
Definition — Pseudoinverse
Let \(A \in \IR[Y,X]\) be any matrix.
The pseudoinverse \(A^+ \in \IR[X,Y]\) is the unique map satisfying:
- \(A * A^+ * A = A\) - Normal Equation
- \(A^+ * A * A^+ = A^+\) - Normal Equation
- \((A * A^+)^* = A * A^+\) - Symmetry
- \((A^+ * A)^* = A^+ * A\) - Symmetry
Proposition
- The inverse matrix is a pseudoinverse if it exists.
- If a pseudoinverse exists it is unique.
Proposition — Full Column Rank Case
Let \(A \in \IR[Y,X]\).
- If \(A\) has full column rank (\(\operatorname{rk}(A) = \# X\)). Then \(A^+ = (A^* * A)^{-1} * A^*.\)
- If \(A\) has full row rank (\(\operatorname{rk}(A) = \# Y\)). Then \(A^+ = A^* * (A * A^*)^{-1}.\)
Proof
Set \(y = A^+ x\). The pseudoinverse satisfies \(A^* A A^+ = A^*\), so \(A^* A (A^+ x) = A^* x\) showing \(y = A^+ x\) solves the "normal equation" \(A^* A x = A^* y\).
Solving for \(y\) gives us \(A^+ x = (A^* A)^{-1} A^* x\).
Verifying properties:
- \(A A^+ A = A\) since \((A^* A)^{-1} A^* A = I\).
- \(A^+ A A^+ = A^+\) since \(A^+ A = I\).
- Symmetry conditions follow since \(A^* A\) is symmetric.
The row rank case follows similarly.
Proposition — Pseudoinverse via SVD
Let \(A = \sum_{z \in Z} \sigma_z v_z \tensor u_z\) be the singular value decomposition of \(A\), with singular values \(\sigma_z > 0\).
Then $$ A^+ = \sum_{z \in Z} \frac{1}{\sigma_z} u_z \tensor v_z. $$
Proof
We compute:
$$
A * A^+ = \sum_{z} v_z \tensor v_z, \quad
A^+ * A = \sum_{z} u_z \tensor u_z
$$
These are orthogonal projectors onto \(\operatorname{Im}(A)\) and \(\operatorname{Im}(A^*)\).
Verifying the defining properties:
- \(A A^+ A = A\)
- \(A^+ A A^+ = A^+\)
- \(A A^+\), \(A^+ A\) symmetric.