The Cayley-Hamilton Theorem

A short rigidity argument.

Heinrich Hartmann • 2025-07-21

Abstract

We give a short real-variable proof of the Cayley--Hamilton theorem using a local-to-global rigidity argument: prove the identity on a neighbourhood of a diagonal matrix with simple spectrum and extend to all matrices using polynomiality.

Introduction

$$ % ============================================ % GENERAL / UNIVERSAL % Used throughout the documentation % ============================================ % --- Basic Number Systems --- \newcommand{\IR}{\mathbb{R}} % Real numbers #listed \newcommand{\IC}{\mathbb{C}} % Complex numbers #listed \newcommand{\IN}{\mathbb{N}} % Natural numbers #listed \newcommand{\IZ}{\mathbb{Z}} % Integers #listed \newcommand{\IQ}{\mathbb{Q}} % Rational numbers #listed \newcommand{\IA}{\mathbb{A}} % Affine space #listed \newcommand{\IB}{\mathbb{B}} % Generic field #listed \newcommand{\ID}{\mathbb{D}} % Generic field #listed \newcommand{\IF}{\mathbb{F}} % Generic field #listed \newcommand{\IH}{\mathbb{H}} % Quaternions #listed \newcommand{\II}{\mathbb{I}} % Generic field #listed \newcommand{\IL}{\mathbb{L}} % Generic field #listed \newcommand{\IP}{\mathbb{P}} % Projective space #listed \newcommand{\IS}{\mathbb{S}} % Sphere #listed \newcommand{\IV}{\mathbb{V}} % Generic vector space #listed % --- Function Spaces --- \newcommand{\CINF}{\mathcal{C}^\infty} % Smooth (infinitely differentiable) functions #listed \newcommand{\CC}{\mathcal{C}} % C^k functions #listed \newcommand{\Ck}{\mathcal{C}^k} % C^k functions #listed \newcommand{\CK}{\mathcal{C}^K} % C^k functions #listed % --- Fundamental Operators --- \newcommand{\del}{\partial} % Partial derivative #listed \newcommand{\uDelta}{\underline{\Delta}} % Discrete difference operator #listed \newcommand{\Shift}{\mathrm{S}_\downarrow} % Shift operator #listed % --- Logical & Set Operations --- \newcommand{\IFF}{\Leftrightarrow} % If and only if #listed \newcommand{\Ind}{\mathbb{1}} % Indicator/characteristic function #listed \newcommand{\IndA}[1]{\mathbb{1}\lbrace #1 \rbrace} % Indicator with condition #listed \newcommand{\1}{\mathbb{1}} % Indicator shorthand #listed \newcommand{\Set}[2]{\left\{\, #1 \;\vert\; #2 \,\right\}} % Set-builder notation #listed \newcommand{\CSet}[2]{\#\{\, #1 \;\vert\; #2 \,\right\}} % Cardinality notation #listed \newcommand{\C}{\,\#} % Cardinality operator #listed % --- Limits & Categorical --- \newcommand{\limproj}{\varprojlim} % Inverse limit #listed \newcommand{\limind}{\varinjlim} % Direct limit #listed \newcommand{\Hom}{\mathrm{Hom}} % Homomorphism #listed \newcommand{\End}{\mathrm{End}} % Endomorphism #listed \newcommand{\Ext}{\mathrm{Ext}} % Ext functor #listed % --- Arrows & Relations --- \newcommand{\ra}{\rightarrow} % Right arrow #listed \newcommand{\lra}{\longrightarrow} % Long right arrow #listed \newcommand{\xlra}[1]{\overset{#1}{\lra}} % Labeled long arrow #listed \newcommand{\la}{\leftarrow} % Left arrow #listed \newcommand{\lla}{\longleftarrow} % Long left arrow #listed \newcommand{\mono}{\hookrightarrow} % Monomorphism #listed \newcommand{\epi}{\twoheadrightarrow} % Epimorphism #listed \newcommand{\isom}{\cong} % Isomorphism #listed \newcommand{\downto}{\searrow} % Diagonal arrow #listed % --- Tensor & Algebraic Structures --- \newcommand{\tensor}{\otimes} % Tensor product #listed \newcommand{\tensors}{\tensor\dots\tensor} % Multiple tensor products #listed \newcommand{\Tensor}{\bigotimes} % Big tensor product #listed \newcommand{\stensor}{\odot} % Symmetric tensor product #listed \newcommand{\vsum}{\oplus} % Direct sum #listed \newcommand{\Vsum}{\bigoplus} % Big direct sum #listed % --- Calligraphic Letters (Generic) --- \newcommand{\KA}{\mathcal{A}} \newcommand{\KB}{\mathcal{B}} \newcommand{\KC}{\mathcal{C}} \newcommand{\KD}{\mathcal{D}} \newcommand{\KF}{\mathcal{F}} \newcommand{\KH}{\mathcal{H}} \newcommand{\KI}{\mathcal{I}} \newcommand{\KL}{\mathcal{L}} \newcommand{\KN}{\mathcal{N}} \newcommand{\KP}{\mathcal{P}} \newcommand{\KQ}{\mathcal{Q}} \newcommand{\KR}{\mathcal{R}} \newcommand{\KS}{\mathcal{S}} \newcommand{\KV}{\mathcal{V}} \newcommand{\KZ}{\mathcal{Z}} % --- Fraktur Letters (Generic) --- \newcommand{\gc}{\mathfrak{C}} \newcommand{\gd}{\mathfrak{D}} \newcommand{\gM}{\mathfrak{M}} \newcommand{\gm}{\mathfrak{m}} \newcommand{\gf}{\mathfrak{f}} \newcommand{\gu}{\mathfrak{U}} \newcommand{\fa}{\mathfrak{a}} \newcommand{\fg}{\mathfrak{g}} \newcommand{\fn}{\mathfrak{n}} \newcommand{\fk}{\mathfrak{k}} \newcommand{\fm}{\mathfrak{m}} \newcommand{\fp}{\mathfrak{p}} % --- Text & Formatting --- \newcommand{\qtext}[1]{\quad\text{#1}\quad} % Quad spaced text #listed \newcommand{\stext}[1]{\;\text{#1}\;} % Small spaced text #listed \newcommand{\ssum}[1]{\sum_{\substack{#1}}} % Sum with substacked condition #listed \newcommand{\half}{\frac{1}{2}} % One-half #listed \newcommand{\floor}[1]{\lfloor #1 \rfloor} % Floor function #listed \newcommand{\ceil}[1]{\lceil #1 \rceil} % Ceiling function #listed \newcommand{\nl}{\\} % Newline #listed % --- Common Functions --- \newcommand{\id}{\mathrm{id}} % Identity function #listed \newcommand{\rk}{\mathrm{rk}} % Rank #listed \newcommand{\Ker}{\mathrm{Ker}} % Kernel #listed \newcommand{\Diff}{\mathrm{Diff}} % Diffeomorphism group #listed \newcommand{\Pic}{\mathrm{Pic}} % Picard group #listed \newcommand{\Spec}{\mathrm{Spec}} % Spectrum #listed \newcommand{\D}{\mathrm{D}} % Differential operator #listed \newcommand{\DP}{\mathrm{D_{\!+}}} % Positive differential #listed \newcommand{\DDP}{\mathrm{D^{\!+}}} % Upper differential #listed % --- Variants --- \newcommand{\vphi}{\varphi} % Variant phi #listed \newcommand{\sphi}{\phi} % Straight phi #listed \newcommand{\eps}{\varepsilon} % Epsilon variant #listed \newcommand{\pt}{*} % Point notation #listed \newcommand{\point}{*} % Point notation alt #listed % --- Set Operations --- \newcommand{\union}{\cup} % Union #listed \newcommand{\Union}{\bigcup} % Big union #listed \newcommand{\dotcup}{\ensuremath{\mathaccent\cdot\cup}} % Disjoint union #listed \newcommand{\dunion}{\dotcup} % Disjoint union alt #listed % \< and \> removed - conflict with LaTeX tabbing commands and unused in docs \newcommand{\inpart}[1]{\in\text{\part}(#1)} % In partition of #listed \newcommand{\trl}{\triangleleft} % Left triangle #listed \newcommand{\trr}{\triangleright} % Right triangle #listed % --- Misc --- \newcommand{\curly}[1]{\mathcal{#1}} \newcommand{\op}[1]{\mathrm{#1}} \newcommand{\Cat}[1]{\mathfrak{#1}} \newcommand{\cat}[1]{\mathbf{#1}} \newcommand{\CAT}[1]{\left\{\,\text{#1}\,\right\}} \newcommand{\CATii}[2]{\left\{\,\begin{array}{c}\text{#1}\\\text{#2}\end{array}\,\right\}} \newcommand{\Mon}{\mathrm{Mon}} \newcommand{\Lin}{\mathrm{Lin}} \newcommand{\ev}{\mathrm{ev}} \newcommand{\tc}{\prec_{\mathrm{tc}}} \newcommand{\tightlist}{\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} \newcommand{\QED}{\square} % End of proof #listed \renewcommand{\part}{\vdash} % Turnstile #listed (overrides LaTeX \part sectioning) \newcommand{\opart}{\models} % Semantic entailment #listed \newcommand{\Def}{\mathrm{Def}} % Bialgebra defect #listed % ============================================ % CHAPTER 1: DIFFERENTIAL DUALITY % ============================================ \newcommand{\Jet}{\mathbf{Jet}} % Jet space of C infinity germs #listed \newcommand{\jet}{\mathrm{jet}} % Jet functor #listed \newcommand{\E}{\mathbf{E}} % Space of smooth functions #listed \newcommand{\EE}{\mathbf{E}} % Space of smooth functions alt #listed % ============================================ % CHAPTER 2: DISCRETE DUALITY % ============================================ % --- From Affine Cubes --- \newcommand{\BC}{\mathbf{B}} % Affine Cube space #listed % --- From Cubes.md and affine cube space --- \newcommand{\BB}{\mathbb{B}} % Boolean lattice #listed \newcommand{\minelt}{\hat{0}} % Minimal element #listed \newcommand{\maxelt}{\hat{1}} % Maximal element #listed \newcommand{\sleq}{\subseteq} % Subset relation #listed \newcommand{\drk}{\mathrm{rk_{\Delta}}} % Discrete rank #listed \newcommand{\Cube}{\mathbf{Cube}} % Discrete cubes #listed \newcommand{\aprod}{\star} % Anchored product of cubes #listed \newcommand{\acoprod}{\Delta} % Anchored coproduct of cubes #listed \newcommand{\tcoprod}{\Delta^\tau} % Transport coproduct of cubes #listed \newcommand{\tprod}{\star^\tau} % Transport product of cubes #listed \newcommand{\Part}{\mathbf{Part}} % Partitions of a set #listed \newcommand{\vac}{{|0\rangle}} % Empty partition #listed % --- From Filtered Vector Spaces.md --- \newcommand{\gr}{\mathrm{gr}} % Graded/associated graded object #listed % ============================================ % CHAPTER 3: ULTRA CALCULUS % ============================================ \newcommand{\POS}{\mathbf{Pos}} % Growth domain spaces #listed \newcommand{\U}{\mathbf{U}} % Ultra regulator quotient #listed \newcommand{\IU}{\mathbb{U}} % Ultra regulator #listed \newcommand{\IG}{\mathbb{G}} % Growth profile #listed \newcommand{\Tame}{\mathbf{Tame}} % Tame growth functions #listed \newcommand{\Scale}{\mathrm{Scale}} % Scaling operator #listed \newcommand{\Bell}{\mathcal{B}} % Bell polynomials #listed \newcommand{\uexp}{\exp_+} % Exponential generating series #listed % --- Ultra Structures --- \newcommand{\Sym}{\mathbf{S}} % Symmetric functions/algebra #listed \renewcommand{\SS}{\Sym} % Symmetric functions alt #listed (overrides LaTeX \SS) \newcommand{\SP}{\Sym^+} % Positive symmetric functions #listed \newcommand{\SH}{\hat{\Sym}} % Completed symmetric functions #listed \newcommand{\SPH}{\hat{\Sym}^+} % Completed positive symmetric #listed % --- Ultra Operators --- \newcommand{\MIX}{\mathrm{Mix}} % Mixing operator #listed \newcommand{\BMIX}{\mathrm{BMix}} % Boolean mixing #listed \newcommand{\TMIX}{\mathrm{TMix}} % Transport mixing #listed \newcommand{\TBMIX}{\mathrm{TMix}} % Transport boolean mixing #listed \newcommand{\TRANS}{\mathrm{Trans}} % Transport operator #listed % --- Gauge & Equivalence --- \newcommand{\gaugeleq}{\preccurlyeq} % Gauge less-or-equal #listed \newcommand{\gaugeeq}{\asymp} % Gauge equivalence #listed \newcommand{\gaugegeq}{\preccurlygeq} % Gauge greater-or-equal #listed \newcommand{\tstar}{\circledast} % Tight star product #listed % --- Forward Differences --- \newcommand{\FD}{\blacktriangle} % Forward difference #listed \newcommand{\fd}{\FD} % Forward difference alt #listed \newcommand{\AC}{\square} % Associated character #listed % ============================================ % OPERATORS (DeclareMathOperator) % ============================================ \DeclareMathOperator{\Supp}{\mathrm{Supp}} % Support #listed \DeclareMathOperator{\supp}{\mathrm{Supp}} % Support #listed \newcommand{\esssupp}{\operatorname*{ess-supp}} % Support #listed \DeclareMathOperator{\Alt}{\Lambda} % Alternating/exterior #listed \DeclareMathOperator{\ad}{ad} % Adjoint representation #listed \DeclareMathOperator{\ch}{ch} % Chern character #listed \DeclareMathOperator{\td}{td} % Todd class #listed \DeclareMathOperator{\TD}{TD} % Todd operator #listed \DeclareMathOperator{\pr}{pr} % Projection #listed \DeclareMathOperator{\Map}{Map} % Mapping space #listed \DeclareMathOperator{\Pol}{Pol} % Polarization $$

The Cayley--Hamilton theorem states that every square matrix satisfies its own characteristic polynomial: For $A \in \IR^{n \times n}$ the characteristic polynomial is defined as: $$ \chi_A(t) := \det(tI - A) \in \IR[t], $$ Inserting the matrix A for $t$ one finds $\chi_A(A)=0$. Historically, a quaternionic special case was obtained by Hamilton [Hamilton1853]in 1853, Cayley gave the matrix formulation [Cayley1858] in 1858; a fully general proof was given by Frobenius [Frobenius1878] in 1878.

There are many standard proofs. Over $\IC$, one often reduces to Jordan normal form, or proves the statement first for diagonalizable matrices and then extends by continuity; see for example [HJ2013]. Textbook treatments also give adjugate-matrix proofs, but these require some care: the theorem asserts a matrix identity $\chi_A(A)=0$, and it is not legitimate to argue by the bogus substitution $\chi_A(A)=\det(AI-A)=0$; see [Higham2020].

In this note we give a short proof that stays entirely over $\IR$ and avoids complex canonical forms and density arguments. The core idea is local-to-global rigidity. We fix a diagonal matrix $D_0=\mathrm{diag}(1,\dots,n)$ with simple real spectrum. By an implicit-function-theorem argument, matrices in a neighbourhood of $D_0$ continue to have $n$ distinct real eigenvalues and are therefore diagonalizable. On this open neighbourhood the identity $\chi_A(A)=0$ follows immediately by conjugating to a diagonal matrix. Finally, we observe that the entries of the map $A \mapsto \chi_A(A)$ are polynomial functions of the entries of $A$; hence vanishing on a nonempty open set forces vanishing everywhere on $\IR^{n \times n}$.

This framing as rigidity argument has the benefit of keeping the proof "low-tech'': beyond a basic perturbation lemma for simple eigenvalues and an elementary polynomial identity principle, no structure theory for linear maps is required. The result is a proof that is short, intuitive, and pedagogically robust.

Main Theorem

(1) Theorem (Cayley-Hamilton). Let $A \in \IR^{n \times n}$ and let $\chi_A(\lambda) = \det(\lambda I - A)$ be its characteristic polynomial. Then $$ \chi_A(A) = 0, $$ where $\chi_A(A)$ denotes the matrix polynomial obtained by evaluating $\chi_A$ at $A$.

Proof. The characteristic polynomial is a degree–$n$ polynomial in the entries $A_{i,j}$ of $A$. The evaluation $\chi_A(A)$ is an $n \times n$ matrix, whose entries $F(A)_{i,j}$ are again polynomials in the $A_{i,j}$. We regard this construction as a polynomial map $F: \IR^{n \times n} \to \IR^{n \times n},\ A \mapsto \chi_A(A)$. We want to show that this polynomial map is identically zero: $F \equiv 0$.

To do so it is sufficient to show that there is an open subset in the Euclidean topology where $F$ vanishes identically.

Let $D_0 = \text{diag}(1,2,\ldots,n)$ be the diagonal matrix with $n$ distinct eigenvalues $1,\dots,n$. There exists a neighborhood $U$ of $D_0$ where all matrices $A \in U$ have $n$ distinct real eigenvalues, by Lemma (2) 🔗 below. Any matrix $A \in U$ with $n$ distinct real eigenvalues is diagonalizable and therefore $F(A) = \chi_A(A) = 0$, by Lemma (3) 🔗 below.

Thus $\chi_A(A) = 0$ for all $A \in U$, hence by rigidity $A \in \IR^{n \times n}$.

(2) Lemma. Let $D_0 \in \IR^{n \times n}$ be a matrix with $n$ distinct real eigenvalues. Then there is a neighborhood $U$ of $D_0$ where all matrices $D \in U$ have $n$ distinct real eigenvalues.

Proof. Consider the function $G(D,t) = \chi_D(t)$ as a differentiable map $\IR^{n \times n} \times \IR \to \IR$. The condition $G(D,\lambda) = 0$ is equivalent to $\lambda$ being an eigenvalue of $D$.

We use the implicit function theorem to show that each real eigenvalue $\lambda_i$ of $D_0$ can be continued to a real function $\lambda_i(D)$ in a neighborhood of $D_0$. For each eigenvalue $\lambda_i$ of $D_0$ we have $G(D_0,\lambda_i) = 0$ and

\[ \frac{\partial G}{\partial t}(D_0,\lambda_i) = \chi'_{D_0}(\lambda_i) = \prod_{j \neq i} (\lambda_i - \lambda_j) \neq 0, \]

since all eigenvalues of $D_0$ are distinct.

By the implicit function theorem, there exists a neighborhood $U_i$ of $D_0$ and a smooth function $\lambda_i : U_i \to \IR$ such that $G(D,\lambda_i(D)) = 0$ for all $D \in U_i$, with $\lambda_i(D_0) = \lambda_i$. Taking $U = \bigcap_{i=1}^n U_i$, we obtain a neighborhood where all $n$ eigenvalues $\lambda_1(D),\dots,\lambda_n(D)$ exist as real-valued functions. Since the $\lambda_i$ are continuous and the values $\lambda_i(D_0)$ are distinct, they remain distinct in a sufficiently small neighborhood.

(3) Lemma. Let $A \in \IR^{n \times n}$ have $n$ distinct real eigenvalues. Then $\chi_A(A) = 0$.

Proof. If $A$ has $n$ distinct real eigenvalues $\lambda_1,\dots,\lambda_n$ and corresponding eigenvectors $q_1,\dots,q_n$, then $Q = [q_1,\dots,q_n]$ is invertible and $Q^{-1} A Q = D$, where $D = \text{diag}(\lambda_1,\dots,\lambda_n)$. The characteristic polynomial satisfies $\chi_A(\lambda) = \chi_D(\lambda) = \prod_{i=1}^n (\lambda - \lambda_i)$, and hence

\[ \chi_A(A) = Q \,\chi_D(D)\, Q^{-1} = Q \cdot 0 \cdot Q^{-1} = 0. \]

References

[Hamilton1853] William Rowan Hamilton. Lectures on Quaternions. Hodges and Smith, Dublin, 1853. URL: https://archive.org/details/bub_gb_TCwPAAAAIAAJ.
[Cayley1858] Arthur Cayley. A memoir on the theory of matrices. Philosophical Transactions of the Royal Society of London, 148:17–37, 1858. URL: https://royalsocietypublishing.org/doi/10.1098/rstl.1858.0002, doi:10.1098/rstl.1858.0002.
[Frobenius1878] Ferdinand Georg Frobenius. Über lineare substitutionen und bilineare formen. Journal für die reine und angewandte Mathematik, 84:1–63, 1878. URL: https://www.degruyterbrill.com/document/doi/10.1515/crelle-1878-18788403/html, doi:10.1515/crelle-1878-18788403.
[HJ2013] Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press, 2nd edition, 2013. URL: https://www.cambridge.org/highereducation/books/matrix-analysis/3D0C9628C97B0E2E0EBA1C0935C8C34F.
[Higham2020] Nicholas J. Higham. What is the Cayley–Hamilton theorem? Blog post, 2020. URL: https://nhigham.com/2020/11/03/what-is-the-cayley-hamilton-theorem/.

The Cayley-Hamilton Theorem

Abstract

Introduction

Main Theorem

References

Comments