$$ \newcommand{\nl}{\\} \newcommand{\half}{\frac{1}{2}} \newcommand{\floor}[1]{\lfloor #1 \rfloor} \newcommand{\ceil}[1]{\lceil #1 \rceil} \newcommand{\Set}[2]{\left\{\, #1 \;\vert\; #2 \,\right\}} \newcommand{\C}{\,\#} \newcommand{\CSet}[2]{\#\{\, #1 \;\vert\; #2 \,\}} \newcommand{\qtext}[1]{\quad\text{#1}\quad} \newcommand{\stext}[1]{\;\text{#1}\;} \newcommand{\IFF}{\Leftrightarrow} \newcommand{\inf}{\infty} \newcommand{\Ind}{\mathbb{1}} \newcommand{\IR}{\mathbb{R}} \newcommand{\IA}{\mathbb{A}} \newcommand{\IB}{\mathbb{B}} \newcommand{\IC}{\mathbb{C}} \newcommand{\ID}{\mathbb{D}} \newcommand{\IF}{\mathbb{F}} \newcommand{\IH}{\mathbb{H}} \newcommand{\II}{\mathbb{I}} \newcommand{\IL}{\mathbb{L}} \newcommand{\IN}{\mathbb{N}} \newcommand{\IP}{\mathbb{P}} \newcommand{\IQ}{\mathbb{Q}} \newcommand{\IR}{\mathbb{R}} \newcommand{\IS}{\mathbb{S}} \newcommand{\IV}{\mathbb{V}} \newcommand{\IZ}{\mathbb{Z}} \newcommand{\KR}{\mathcal{R}} \newcommand{\KA}{\mathcal{A}} \newcommand{\KB}{\mathcal{B}} \newcommand{\KC}{\mathcal{C}} \newcommand{\KD}{\mathcal{D}} \newcommand{\KF}{\mathcal{F}} \newcommand{\KH}{\mathcal{H}} \newcommand{\KI}{\mathcal{I}} \newcommand{\KL}{\mathcal{L}} \newcommand{\KN}{\mathcal{N}} \newcommand{\KP}{\mathcal{P}} \newcommand{\KQ}{\mathcal{Q}} \newcommand{\KR}{\mathcal{R}} \newcommand{\KS}{\mathcal{S}} \newcommand{\KV}{\mathcal{V}} \newcommand{\KZ}{\mathcal{Z}} \newcommand{\gc}{\mathfrak{C}} \newcommand{\gd}{\mathfrak{D}} \newcommand{\gM}{\mathfrak{M}} \newcommand{\gm}{\mathfrak{m}} \newcommand{\gf}{\mathfrak{f}} \newcommand{\gu}{\mathfrak{U}} \newcommand{\fa}{\mathfrak{a}} \newcommand{\fg}{\mathfrak{g}} \newcommand{\fn}{\mathfrak{n}} \newcommand{\fk}{\mathfrak{k}} \newcommand{\fm}{\mathfrak{m}} \newcommand{\fp}{\mathfrak{p}} \newcommand{\curly}[1]{\mathcal{#1}} \newcommand{\op}[1]{\mathrm{#1}} \newcommand{\Cat}[1]{\mathfrak{#1}} \newcommand{\cat}[1]{\mathbf{#1}} \newcommand{\vphi}{\varphi} \newcommand{\sphi}{\phi} \newcommand{\eps}{\varepsilon} \newcommand{\tensor}{\otimes} \newcommand{\tensors}{\tensor\dots\tensor} \newcommand{\Tensor}{\bigotimes} \newcommand{\ra}{\rightarrow} \newcommand{\lra}{\longrightarrow} \newcommand{\la}{\leftarrow} \newcommand{\lla}{\longleftarrow} \newcommand{\isom}{\cong} \newcommand{\epi}{\twoheadrightarrow} \newcommand{\mono}{\hookrightarrow} \newcommand{\del}{\partial} \newcommand{\union}{\cup} \newcommand{\Union}{\bigcup} \newcommand{\dotcup}{\ensuremath{\mathaccent\cdot\cup}} \newcommand{\dunion}{\dotcup} \newcommand{\<}{\langle} \renewcommand{\>}{\rangle} \newcommand{\inpart}[1]{\in\text{\part}(#1)} \newcommand{\Vsum}{\bigoplus} \newcommand{\vsum}{\oplus} \renewcommand{\S}{\mathfrak{S}} \newcommand{\id}{\mathrm{id}} \newcommand{\rk}{\mathrm{rk}} \newcommand{\Diff}{\mathrm{Diff}} \newcommand{\Hom}{\mathrm{Hom}} \newcommand{\Pic}{\mathrm{Pic}} \newcommand{\Spec}{\mathrm{Spec}} \newcommand{\End}{\mathrm{End}} \newcommand{\Ext}{\mathrm{Ext}} \DeclareMathOperator{\Supp}{\mathrm{Supp}} \DeclareMathOperator{\Sym}{Sym} \DeclareMathOperator{\Alt}{\Lambda} \DeclareMathOperator{\ad}{ad} \DeclareMathOperator{\ch}{ch} \DeclareMathOperator{\td}{td} \DeclareMathOperator{\pr}{pr} \newcommand{\one}{\mathbb{1}} \newcommand{\set}[2]{\{\, #1 \;\vert\; #2 \,\}} \newcommand{\dbrackets}[1]{ [\![ #1 ]\!]} \newcommand{\Jac}{\mathcal{J}} \newcommand{\stext}[1]{\{ \, \text{#1} \, \}} \newcommand{\ptext}[1]{( \, \text{#1} \, )} $$

A short proof for the Cayley-Hamilton Theorem

by Heinrich Hartmann / 2025-07-21 / Schwerin

Abstract

We present a short and self-contained proof of the Cayley--Hamilton theorem for real matrices that avoids the use of complex numbers, Jordan forms, or density arguments. Instead, it exploits the real-analytic rigidity of polynomials to extend the result from an open subset to the full space.

Introduction

The Cayley--Hamilton theorem asserts that every square matrix satisfies its own characteristic polynomial. Standard proofs often invoke either the Jordan canonical form over $\mathbb{C}$, or the fact that diagonalizable matrices are dense in $\mathbb{C}^{n \times n}$ and use continuity to extend the result. Algebraic proofs using the adjugate matrix are subtle, as one must be very precise about when the substitution $t \mapsto A$ is permitted.

We present a straightforward proof that works exclusively in the real setting $\IR^{n \times n}$. Here, diagonalizable matrices are not dense, so we cannot rely on continuity arguments. Instead, we exploit the rigidity of polynomial maps: it suffices to verify the result on any open subset, and the identity theorem for real-analytic functions extends it to the entire space.

Theorem (Cayley-Hamilton)

Let $A \in \IR^{n \times n}$ and let $\chi_A(\lambda) = \det(\lambda I - A)$ be its characteristic polynomial. Then $$ \chi_A(A) = 0, $$ where $\chi_A(A)$ denotes the matrix polynomial obtained by evaluating $\chi_A$ at $A$.

Proof of the Theorem

The characteristic polynomial is a degree-n polynomial in the entries $A_{i,j}$ of $A$. The evaluation $\chi_A(A)$ is a $n \times n$ matrix, whose entries $F(A)_{i,j}$ are again polynomials in $A_{ij}$, of degree at most $n^2$.

We regard this construction as a polynomial map $F: \IR^{n \times n} \to \IR^{n \times n}, A \mapsto \chi_A(A)$.

We want to show that this polynomial map is identically zero: $F = 0$.

To do so it's sufficient to show that there is an open subset in the euclidean topology where $F$ vanishes completely.

Let $D_0 = \text{diag}(1,2,\ldots,n)$ be the matrix with $n$ distinct eigenvalues $1,\dots,n$.

There exists a neighborhood $U$ of $D_0$ where all matrices $A \in U$ have $n$ distinct real eigenvalues (See Lemma 1 below for a complete proof).

Any matrix $A \in U$ with $n$ distinct real eigenvalues is diagonalizable and therefore $F(A) = \chi_A(A) = 0$ (See Lemma 2 below for a complete proof).

Therefore, $F$ vanishes on the entire open set $U$.

By the identity theorem for real-analytic functions, or Taylor Expansion around $D_0$ we see that $F$ vanishes identically. Hence, $\chi_A(A) = 0$ for all matrices $A \in \IR^{n \times n}$.

$\blacksquare$

Lemma 1

Let $D_0$ be a matrix with $n$ distinct (real) eigenvalues, then there is a neighborhood $U$ of $D_0$ where all matrices $D \in U$ have $n$ distinct (real) eigenvalues.

Proof

We consider the function $G(D,t) = \chi_D(t)$ as a differentiable map $\IR^{n \times n} \times \IR \to \IR$. The condition $G(D,\lambda) = 0$ is equivalent to $\lambda$ being an eigenvalue for $D$.

We use the implicit function theorem to show that each real eigenvalue $\lambda_i$ of $D_0$ can be continued to a real function $\lambda_i(D)$ in a neighborhood of $D_0$.

Indeed, for each eigenvalue $\lambda_i$ of $D_0$, we have $G(D_0, \lambda_i) = 0$ and

\[ \frac{\partial G}{\partial t}(D_0, \lambda_i) = \chi'_{D_0}(\lambda_i) = \prod_{j \neq i}(\lambda_i - \lambda_j) \neq 0 \]

since all eigenvalues of $D_0$ are distinct.

By the implicit function theorem, there exists a neighborhood $U_i$ of $D_0$ and a smooth function $\lambda_i: U_i \to \IR$ such that $G(D, \lambda_i(D)) = 0$ for all $D \in U_i$, with $\lambda_i(D_0) = \lambda_i$.

Taking $U = \bigcap_{i=1}^n U_i$, we obtain a neighborhood where all $n$ eigenvalues $\lambda_1(D), \ldots, \lambda_n(D)$ exist as real-valued functions. Since the $\lambda_i$ are continuous and $\lambda_i(D_0) = \lambda_i$ are distinct, they remain distinct in a sufficiently small neighborhood. $\blacksquare$

Lemma 2

Let $A \in \IR^{n \times n}$ with $n$ distinct real Eigenvalues, then $\chi_A(A) = 0$.

Proof

If $A \in \IR^{n \times n}$ has $n$ distinct real eigenvalues $\lambda_1, \ldots, \lambda_n$. Let $q_i$ be eigenvectors for $\lambda_i$ then $Q = [q_1, \dots, q_n]$ is a matrix with $Q^{-1} A Q = D$, where $D$ the diagonal matrix $\text{diag}(\lambda_1, \ldots, \lambda_n)$. Now $\chi_A(\lambda) = \chi_D(\lambda) = \prod_{i=1}^n(\lambda - \lambda_i)$, and hence: $\chi_A(A) = Q \cdot \chi_D(D) \cdot Q^{-1} = 0.$ $\blacksquare$

Example 2x2 Case

To illustrate the proof concretely, consider the case $n = 2$. Let

\[ A = \begin{pmatrix} a & b \\ c & d \end{pmatrix} \in \IR^{2 \times 2}. \]

The characteristic polynomial is

\[ \chi_A(t) = \det(t I - A) = t^2 - (a+d)t + (ad-bc). \]

Therefore:

\[ F(A) = \chi_A(A) = \begin{pmatrix} a^2 + bc - a(a+d) + (ad-bc) & b(a+d) - b(a+d) \\ c(a+d) - c(a+d) & bc + d^2 - d(a+d) + (ad-bc) \end{pmatrix}\]

Simplifying each term we find $F = 0$ for all $2 \times 2$ real matrices, confirming the general result.

The eigenvalues are the roots of $\chi(t)$, given by $$ \lambda_{1,2} = \frac{(a+d) \pm \sqrt{(a-d)^2 + 4bc}}{2}. $$

The eigenvalues are distinct and real on the open subset where the discriminant $\Delta = (a-d)^2 + 4bc$ is greater than $0$.

There is another open subset where $\Delta < 0$ and the matrices has pair of complex conjugate eigenvalues.

The two subsets form the connected components of the set $\Delta \neq 0$. Neither component is dense in $\IR^{2 \times 2}$.