A short proof for the Cayley-Hamilton Theorem
by Heinrich Hartmann / 2025-07-21 / Schwerin
Abstract
We present a short and self-contained proof of the Cayley--Hamilton theorem for real matrices that avoids the use of complex numbers, Jordan forms, or density arguments. Instead, it exploits the real-analytic rigidity of polynomials to extend the result from an open subset to the full space.
Introduction
The Cayley--Hamilton theorem asserts that every square matrix satisfies its own characteristic polynomial. Standard proofs often invoke either the Jordan canonical form over \(\mathbb{C}\), or the fact that diagonalizable matrices are dense in \(\mathbb{C}^{n \times n}\) and use continuity to extend the result. Algebraic proofs using the adjugate matrix are subtle, as one must be very precise about when the substitution \(t \mapsto A\) is permitted.
We present a straightforward proof that works exclusively in the real setting \(\IR^{n \times n}\). Here, diagonalizable matrices are not dense, so we cannot rely on continuity arguments. Instead, we exploit the rigidity of polynomial maps: it suffices to verify the result on any open subset, and the identity theorem for real-analytic functions extends it to the entire space.
Theorem (Cayley-Hamilton)
Let \(A \in \IR^{n \times n}\) and let \(\chi_A(\lambda) = \det(\lambda I - A)\) be its characteristic polynomial. Then $$ \chi_A(A) = 0, $$ where \(\chi_A(A)\) denotes the matrix polynomial obtained by evaluating \(\chi_A\) at \(A\).
Proof of the Theorem
The characteristic polynomial is a degree-n polynomial in the entries \(A_{i,j}\) of \(A\). The evaluation \(\chi_A(A)\) is a \(n \times n\) matrix, whose entries \(F(A)_{i,j}\) are again polynomials in \(A_{ij}\), of degree at most \(n^2\).
We regard this construction as a polynomial map \(F: \IR^{n \times n} \to \IR^{n \times n}, A \mapsto \chi_A(A)\).
We want to show that this polynomial map is identically zero: \(F = 0\).
To do so it's sufficient to show that there is an open subset in the euclidean topology where \(F\) vanishes completely.
Let \(D_0 = \text{diag}(1,2,\ldots,n)\) be the matrix with \(n\) distinct eigenvalues \(1,\dots,n\).
There exists a neighborhood \(U\) of \(D_0\) where all matrices \(A \in U\) have \(n\) distinct real eigenvalues (See Lemma 1 below for a complete proof).
Any matrix \(A \in U\) with \(n\) distinct real eigenvalues is diagonalizable and therefore \(F(A) = \chi_A(A) = 0\) (See Lemma 2 below for a complete proof).
Therefore, \(F\) vanishes on the entire open set \(U\).
By the identity theorem for real-analytic functions, or Taylor Expansion around \(D_0\) we see that \(F\) vanishes identically. Hence, \(\chi_A(A) = 0\) for all matrices \(A \in \IR^{n \times n}\).
\(\blacksquare\)
Lemma 1
Let \(D_0\) be a matrix with \(n\) distinct (real) eigenvalues, then there is a neighborhood \(U\) of \(D_0\) where all matrices \(D \in U\) have \(n\) distinct (real) eigenvalues.
Proof
We consider the function \(G(D,t) = \chi_D(t)\) as a differentiable map \(\IR^{n \times n} \times \IR \to \IR\). The condition \(G(D,\lambda) = 0\) is equivalent to \(\lambda\) being an eigenvalue for \(D\).
We use the implicit function theorem to show that each real eigenvalue \(\lambda_i\) of \(D_0\) can be continued to a real function \(\lambda_i(D)\) in a neighborhood of \(D_0\).
Indeed, for each eigenvalue \(\lambda_i\) of \(D_0\), we have \(G(D_0, \lambda_i) = 0\) and
since all eigenvalues of \(D_0\) are distinct.
By the implicit function theorem, there exists a neighborhood \(U_i\) of \(D_0\) and a smooth function \(\lambda_i: U_i \to \IR\) such that \(G(D, \lambda_i(D)) = 0\) for all \(D \in U_i\), with \(\lambda_i(D_0) = \lambda_i\).
Taking \(U = \bigcap_{i=1}^n U_i\), we obtain a neighborhood where all \(n\) eigenvalues \(\lambda_1(D), \ldots, \lambda_n(D)\) exist as real-valued functions. Since the \(\lambda_i\) are continuous and \(\lambda_i(D_0) = \lambda_i\) are distinct, they remain distinct in a sufficiently small neighborhood. \(\blacksquare\)
Lemma 2
Let \(A \in \IR^{n \times n}\) with \(n\) distinct real Eigenvalues, then \(\chi_A(A) = 0\).
Proof
If \(A \in \IR^{n \times n}\) has \(n\) distinct real eigenvalues \(\lambda_1, \ldots, \lambda_n\). Let \(q_i\) be eigenvectors for \(\lambda_i\) then \(Q = [q_1, \dots, q_n]\) is a matrix with \(Q^{-1} A Q = D\), where \(D\) the diagonal matrix \(\text{diag}(\lambda_1, \ldots, \lambda_n)\). Now \(\chi_A(\lambda) = \chi_D(\lambda) = \prod_{i=1}^n(\lambda - \lambda_i)\), and hence: \(\chi_A(A) = Q \cdot \chi_D(D) \cdot Q^{-1} = 0.\) \(\blacksquare\)
Example 2x2 Case
To illustrate the proof concretely, consider the case \(n = 2\). Let
The characteristic polynomial is
Therefore:
Simplifying each term we find \(F = 0\) for all \(2 \times 2\) real matrices, confirming the general result.
The eigenvalues are the roots of \(\chi(t)\), given by $$ \lambda_{1,2} = \frac{(a+d) \pm \sqrt{(a-d)^2 + 4bc}}{2}. $$
The eigenvalues are distinct and real on the open subset where the discriminant \(\Delta = (a-d)^2 + 4bc\) is greater than \(0\).
There is another open subset where \(\Delta < 0\) and the matrices has pair of complex conjugate eigenvalues.
The two subsets form the connected components of the set \(\Delta \neq 0\). Neither component is dense in \(\IR^{2 \times 2}\).