A short rigidity argument.
Abstract
We give a short real-variable proof of the Cayley--Hamilton theorem using a local-to-global rigidity argument: prove the identity on a neighbourhood of a diagonal matrix with simple spectrum and extend to all matrices using polynomiality.
Introduction
$$
% ============================================
% GENERAL / UNIVERSAL
% Used throughout the documentation
% ============================================
% --- Basic Number Systems ---
\newcommand{\IR}{\mathbb{R}} % Real numbers #listed
\newcommand{\IC}{\mathbb{C}} % Complex numbers #listed
\newcommand{\IN}{\mathbb{N}} % Natural numbers #listed
\newcommand{\IZ}{\mathbb{Z}} % Integers #listed
\newcommand{\IQ}{\mathbb{Q}} % Rational numbers #listed
\newcommand{\IA}{\mathbb{A}} % Affine space #listed
\newcommand{\IB}{\mathbb{B}} % Generic field #listed
\newcommand{\ID}{\mathbb{D}} % Generic field #listed
\newcommand{\IF}{\mathbb{F}} % Generic field #listed
\newcommand{\IH}{\mathbb{H}} % Quaternions #listed
\newcommand{\II}{\mathbb{I}} % Generic field #listed
\newcommand{\IL}{\mathbb{L}} % Generic field #listed
\newcommand{\IP}{\mathbb{P}} % Projective space #listed
\newcommand{\IS}{\mathbb{S}} % Sphere #listed
\newcommand{\IV}{\mathbb{V}} % Generic vector space #listed
% --- Function Spaces ---
\newcommand{\CINF}{\mathcal{C}^\infty} % Smooth (infinitely differentiable) functions #listed
\newcommand{\CC}{\mathcal{C}} % C^k functions #listed
\newcommand{\Ck}{\mathcal{C}^k} % C^k functions #listed
\newcommand{\CK}{\mathcal{C}^K} % C^k functions #listed
% --- Fundamental Operators ---
\newcommand{\del}{\partial} % Partial derivative #listed
\newcommand{\uDelta}{\underline{\Delta}} % Discrete difference operator #listed
\newcommand{\Shift}{\mathrm{S}_\downarrow} % Shift operator #listed
% --- Logical & Set Operations ---
\newcommand{\IFF}{\Leftrightarrow} % If and only if #listed
\newcommand{\Ind}{\mathbb{1}} % Indicator/characteristic function #listed
\newcommand{\IndA}[1]{\mathbb{1}\lbrace #1 \rbrace} % Indicator with condition #listed
\newcommand{\1}{\mathbb{1}} % Indicator shorthand #listed
\newcommand{\Set}[2]{\left\{\, #1 \;\vert\; #2 \,\right\}} % Set-builder notation #listed
\newcommand{\CSet}[2]{\#\{\, #1 \;\vert\; #2 \,\right\}} % Cardinality notation #listed
\newcommand{\C}{\,\#} % Cardinality operator #listed
% --- Limits & Categorical ---
\newcommand{\limproj}{\varprojlim} % Inverse limit #listed
\newcommand{\limind}{\varinjlim} % Direct limit #listed
\newcommand{\Hom}{\mathrm{Hom}} % Homomorphism #listed
\newcommand{\End}{\mathrm{End}} % Endomorphism #listed
\newcommand{\Ext}{\mathrm{Ext}} % Ext functor #listed
% --- Arrows & Relations ---
\newcommand{\ra}{\rightarrow} % Right arrow #listed
\newcommand{\lra}{\longrightarrow} % Long right arrow #listed
\newcommand{\xlra}[1]{\overset{#1}{\lra}} % Labeled long arrow #listed
\newcommand{\la}{\leftarrow} % Left arrow #listed
\newcommand{\lla}{\longleftarrow} % Long left arrow #listed
\newcommand{\mono}{\hookrightarrow} % Monomorphism #listed
\newcommand{\epi}{\twoheadrightarrow} % Epimorphism #listed
\newcommand{\isom}{\cong} % Isomorphism #listed
\newcommand{\downto}{\searrow} % Diagonal arrow #listed
% --- Tensor & Algebraic Structures ---
\newcommand{\tensor}{\otimes} % Tensor product #listed
\newcommand{\tensors}{\tensor\dots\tensor} % Multiple tensor products #listed
\newcommand{\Tensor}{\bigotimes} % Big tensor product #listed
\newcommand{\stensor}{\odot} % Symmetric tensor product #listed
\newcommand{\vsum}{\oplus} % Direct sum #listed
\newcommand{\Vsum}{\bigoplus} % Big direct sum #listed
% --- Calligraphic Letters (Generic) ---
\newcommand{\KA}{\mathcal{A}}
\newcommand{\KB}{\mathcal{B}}
\newcommand{\KC}{\mathcal{C}}
\newcommand{\KD}{\mathcal{D}}
\newcommand{\KF}{\mathcal{F}}
\newcommand{\KH}{\mathcal{H}}
\newcommand{\KI}{\mathcal{I}}
\newcommand{\KL}{\mathcal{L}}
\newcommand{\KN}{\mathcal{N}}
\newcommand{\KP}{\mathcal{P}}
\newcommand{\KQ}{\mathcal{Q}}
\newcommand{\KR}{\mathcal{R}}
\newcommand{\KS}{\mathcal{S}}
\newcommand{\KV}{\mathcal{V}}
\newcommand{\KZ}{\mathcal{Z}}
% --- Fraktur Letters (Generic) ---
\newcommand{\gc}{\mathfrak{C}}
\newcommand{\gd}{\mathfrak{D}}
\newcommand{\gM}{\mathfrak{M}}
\newcommand{\gm}{\mathfrak{m}}
\newcommand{\gf}{\mathfrak{f}}
\newcommand{\gu}{\mathfrak{U}}
\newcommand{\fa}{\mathfrak{a}}
\newcommand{\fg}{\mathfrak{g}}
\newcommand{\fn}{\mathfrak{n}}
\newcommand{\fk}{\mathfrak{k}}
\newcommand{\fm}{\mathfrak{m}}
\newcommand{\fp}{\mathfrak{p}}
% --- Text & Formatting ---
\newcommand{\qtext}[1]{\quad\text{#1}\quad} % Quad spaced text #listed
\newcommand{\stext}[1]{\;\text{#1}\;} % Small spaced text #listed
\newcommand{\ssum}[1]{\sum_{\substack{#1}}} % Sum with substacked condition #listed
\newcommand{\half}{\frac{1}{2}} % One-half #listed
\newcommand{\floor}[1]{\lfloor #1 \rfloor} % Floor function #listed
\newcommand{\ceil}[1]{\lceil #1 \rceil} % Ceiling function #listed
\newcommand{\nl}{\\} % Newline #listed
% --- Common Functions ---
\newcommand{\id}{\mathrm{id}} % Identity function #listed
\newcommand{\rk}{\mathrm{rk}} % Rank #listed
\newcommand{\Ker}{\mathrm{Ker}} % Kernel #listed
\newcommand{\Diff}{\mathrm{Diff}} % Diffeomorphism group #listed
\newcommand{\Pic}{\mathrm{Pic}} % Picard group #listed
\newcommand{\Spec}{\mathrm{Spec}} % Spectrum #listed
\newcommand{\D}{\mathrm{D}} % Differential operator #listed
\newcommand{\DP}{\mathrm{D_{\!+}}} % Positive differential #listed
\newcommand{\DDP}{\mathrm{D^{\!+}}} % Upper differential #listed
% --- Variants ---
\newcommand{\vphi}{\varphi} % Variant phi #listed
\newcommand{\sphi}{\phi} % Straight phi #listed
\newcommand{\eps}{\varepsilon} % Epsilon variant #listed
\newcommand{\pt}{*} % Point notation #listed
\newcommand{\point}{*} % Point notation alt #listed
% --- Set Operations ---
\newcommand{\union}{\cup} % Union #listed
\newcommand{\Union}{\bigcup} % Big union #listed
\newcommand{\dotcup}{\ensuremath{\mathaccent\cdot\cup}} % Disjoint union #listed
\newcommand{\dunion}{\dotcup} % Disjoint union alt #listed
\newcommand{\<}{\langle} % Left angle bracket #listed
\newcommand{\>}{\rangle} % Right angle bracket #listed
\newcommand{\inpart}[1]{\in\text{\part}(#1)} % In partition of #listed
\newcommand{\trl}{\triangleleft} % Left triangle #listed
\newcommand{\trr}{\triangleright} % Right triangle #listed
% --- Misc ---
\newcommand{\curly}[1]{\mathcal{#1}}
\newcommand{\op}[1]{\mathrm{#1}}
\newcommand{\Cat}[1]{\mathfrak{#1}}
\newcommand{\cat}[1]{\mathbf{#1}}
\newcommand{\CAT}[1]{\left\{\,\text{#1}\,\right\}}
\newcommand{\CATii}[2]{\left\{\,\begin{array}{c}\text{#1}\\\text{#2}\end{array}\,\right\}}
\newcommand{\Mon}{\mathrm{Mon}}
\newcommand{\Lin}{\mathrm{Lin}}
\newcommand{\ev}{\mathrm{ev}}
\newcommand{\tc}{\prec_{\mathrm{tc}}}
\newcommand{\tightlist}{\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
\newcommand{\QED}{\square} % End of proof #listed
\newcommand{\part}{\vdash} % Turnstile #listed
\newcommand{\opart}{\models} % Semantic entailment #listed
\newcommand{\Def}{\mathrm{Def}} % Bialgebra defect #listed
% ============================================
% CHAPTER 1: DIFFERENTIAL DUALITY
% ============================================
\newcommand{\Jet}{\mathbf{Jet}} % Jet space of C infinity germs #listed
\newcommand{\jet}{\mathrm{jet}} % Jet functor #listed
\newcommand{\E}{\mathbf{E}} % Space of smooth functions #listed
\newcommand{\EE}{\mathbf{E}} % Space of smooth functions alt #listed
% ============================================
% CHAPTER 2: DISCRETE DUALITY
% ============================================
% --- From Affine Cubes ---
\newcommand{\BC}{\mathbf{B}} % Affine Cube space #listed
% --- From Cubes.md and affine cube space ---
\newcommand{\BB}{\mathbb{B}} % Boolean lattice #listed
\newcommand{\minelt}{\hat{0}} % Minimal element #listed
\newcommand{\maxelt}{\hat{1}} % Maximal element #listed
\newcommand{\sleq}{\subseteq} % Subset relation #listed
\newcommand{\drk}{\mathrm{rk_{\Delta}}} % Discrete rank #listed
\newcommand{\Cube}{\mathbf{Cube}} % Discrete cubes #listed
\newcommand{\aprod}{\star} % Anchored product of cubes #listed
\newcommand{\acoprod}{\Delta} % Anchored coproduct of cubes #listed
\newcommand{\tcoprod}{\Delta^\tau} % Transport coproduct of cubes #listed
\newcommand{\tprod}{\star^\tau} % Transport product of cubes #listed
\newcommand{\Part}{\mathbf{Part}} % Partitions of a set #listed
\newcommand{\vac}{{|0\rangle}} % Empty partition #listed
% --- From Filtered Vector Spaces.md ---
\newcommand{\gr}{\mathrm{gr}} % Graded/associated graded object #listed
% ============================================
% CHAPTER 3: ULTRA CALCULUS
% ============================================
\newcommand{\POS}{\mathbf{Pos}} % Growth domain spaces #listed
\newcommand{\U}{\mathbf{U}} % Ultra regulator quotient #listed
\newcommand{\IU}{\mathbb{U}} % Ultra regulator #listed
\newcommand{\IG}{\mathbb{G}} % Growth profile #listed
\newcommand{\Tame}{\mathbf{Tame}} % Tame growth functions #listed
\newcommand{\Scale}{\mathrm{Scale}} % Scaling operator #listed
\newcommand{\Bell}{\mathcal{B}} % Bell polynomials #listed
\newcommand{\uexp}{\exp_+} % Exponential generating series #listed
% --- Ultra Structures ---
\newcommand{\Sym}{\mathbf{S}} % Symmetric functions/algebra #listed
\newcommand{\SS}{\Sym} % Symmetric functions alt #listed
\newcommand{\SP}{\Sym^+} % Positive symmetric functions #listed
\newcommand{\SH}{\hat{\Sym}} % Completed symmetric functions #listed
\newcommand{\SPH}{\hat{\Sym}^+} % Completed positive symmetric #listed
% --- Ultra Operators ---
\newcommand{\MIX}{\mathrm{Mix}} % Mixing operator #listed
\newcommand{\BMIX}{\mathrm{BMix}} % Boolean mixing #listed
\newcommand{\TMIX}{\mathrm{TMix}} % Transport mixing #listed
\newcommand{\TBMIX}{\mathrm{TMix}} % Transport boolean mixing #listed
\newcommand{\TRANS}{\mathrm{Trans}} % Transport operator #listed
% --- Gauge & Equivalence ---
\newcommand{\gaugeleq}{\preccurlyeq} % Gauge less-or-equal #listed
\newcommand{\gaugeeq}{\asymp} % Gauge equivalence #listed
\newcommand{\gaugegeq}{\preccurlygeq} % Gauge greater-or-equal #listed
\newcommand{\tstar}{\circledast} % Tight star product #listed
% --- Forward Differences ---
\newcommand{\FD}{\blacktriangle} % Forward difference #listed
\newcommand{\fd}{\FD} % Forward difference alt #listed
\newcommand{\AC}{\square} % Associated character #listed
% ============================================
% OPERATORS (DeclareMathOperator)
% ============================================
\DeclareMathOperator{\Supp}{\mathrm{Supp}} % Support #listed
\DeclareMathOperator{\Alt}{\Lambda} % Alternating/exterior #listed
\DeclareMathOperator{\ad}{ad} % Adjoint representation #listed
\DeclareMathOperator{\ch}{ch} % Chern character #listed
\DeclareMathOperator{\td}{td} % Todd class #listed
\DeclareMathOperator{\TD}{TD} % Todd operator #listed
\DeclareMathOperator{\pr}{pr} % Projection #listed
\DeclareMathOperator{\Map}{Map} % Mapping space #listed
$$
The Cayley--Hamilton theorem states that every square matrix satisfies its own characteristic polynomial:
For \(A \in \IR^{n \times n}\) the characteristic polynomial is defined as:
$$
\chi_A(t) := \det(tI - A) \in \IR[t],
$$
Inserting the matrix A for \(t\) one finds \(\chi_A(A)=0\).
Historically, a quaternionic special case was obtained by Hamilton [Hamilton1853]in 1853, Cayley gave the matrix formulation [Cayley1858] in 1858;
a fully general proof was given by Frobenius [Frobenius1878] in 1878.
There are many standard proofs. Over \(\IC\), one often reduces to Jordan normal form, or proves the statement first for diagonalizable matrices and then extends by continuity; see for example [HJ2013]. Textbook treatments also give adjugate-matrix proofs, but these require some care: the theorem asserts a matrix identity \(\chi_A(A)=0\), and it is not legitimate to argue by the bogus substitution \(\chi_A(A)=\det(AI-A)=0\); see [Higham2020].
In this note we give a short proof that stays entirely over \(\IR\) and avoids complex canonical forms and density arguments. The core idea is local-to-global rigidity. We fix a diagonal matrix \(D_0=\mathrm{diag}(1,\dots,n)\) with simple real spectrum. By an implicit-function-theorem argument, matrices in a neighbourhood of \(D_0\) continue to have \(n\) distinct real eigenvalues and are therefore diagonalizable. On this open neighbourhood the identity \(\chi_A(A)=0\) follows immediately by conjugating to a diagonal matrix. Finally, we observe that the entries of the map \(A \mapsto \chi_A(A)\) are polynomial functions of the entries of \(A\); hence vanishing on a nonempty open set forces vanishing everywhere on \(\IR^{n \times n}\).
This framing as rigidity argument has the benefit of keeping the proof "low-tech'': beyond a basic perturbation lemma for simple eigenvalues and an elementary polynomial identity principle, no structure theory for linear maps is required. The result is a proof that is short, intuitive, and pedagogically robust.
Main Theorem
(1) Theorem (Cayley-Hamilton).
Let \(A \in \IR^{n \times n}\) and let \(\chi_A(\lambda) = \det(\lambda I - A)\) be its characteristic polynomial. Then
$$
\chi_A(A) = 0,
$$
where \(\chi_A(A)\) denotes the matrix polynomial obtained by evaluating \(\chi_A\) at \(A\).
Proof.
The characteristic polynomial is a degree–\(n\) polynomial in the entries \(A_{i,j}\) of \(A\).
The evaluation \(\chi_A(A)\) is an \(n \times n\) matrix, whose entries \(F(A)_{i,j}\) are again polynomials in the \(A_{i,j}\).
We regard this construction as a polynomial map \(F: \IR^{n \times n} \to \IR^{n \times n},\ A \mapsto \chi_A(A)\).
We want to show that this polynomial map is identically zero: \(F \equiv 0\).
To do so it is sufficient to show that there is an open subset in the Euclidean topology where \(F\) vanishes identically.
Let \(D_0 = \text{diag}(1,2,\ldots,n)\) be the diagonal matrix with \(n\) distinct eigenvalues \(1,\dots,n\).
There exists a neighborhood \(U\) of \(D_0\) where all matrices \(A \in U\) have \(n\) distinct real eigenvalues, by Lemma 🔗 below.
Any matrix \(A \in U\) with \(n\) distinct real eigenvalues is diagonalizable and therefore \(F(A) = \chi_A(A) = 0\), by Lemma 🔗 below.
Thus \(\chi_A(A) = 0\) for all \(A \in U\), hence by rigidity \(A \in \IR^{n \times n}\).
(2) Lemma.
Let \(D_0 \in \IR^{n \times n}\) be a matrix with \(n\) distinct real eigenvalues.
Then there is a neighborhood \(U\) of \(D_0\) where all matrices \(D \in U\) have \(n\) distinct real eigenvalues.
Proof.
Consider the function \(G(D,t) = \chi_D(t)\) as a differentiable map \(\IR^{n \times n} \times \IR \to \IR\).
The condition \(G(D,\lambda) = 0\) is equivalent to \(\lambda\) being an eigenvalue of \(D\).
We use the implicit function theorem to show that each real eigenvalue \(\lambda_i\) of \(D_0\) can be continued to
a real function \(\lambda_i(D)\) in a neighborhood of \(D_0\).
For each eigenvalue \(\lambda_i\) of \(D_0\) we have \(G(D_0,\lambda_i) = 0\) and
\[
\frac{\partial G}{\partial t}(D_0,\lambda_i)
= \chi'_{D_0}(\lambda_i)
= \prod_{j \neq i} (\lambda_i - \lambda_j) \neq 0,
\]
since all eigenvalues of \(D_0\) are distinct.
By the implicit function theorem, there exists a neighborhood \(U_i\) of \(D_0\) and a smooth function \(\lambda_i : U_i \to \IR\) such that \(G(D,\lambda_i(D)) = 0\) for all \(D \in U_i\), with \(\lambda_i(D_0) = \lambda_i\).
Taking \(U = \bigcap_{i=1}^n U_i\), we obtain a neighborhood where all \(n\) eigenvalues \(\lambda_1(D),\dots,\lambda_n(D)\) exist as real-valued functions.
Since the \(\lambda_i\) are continuous and the values \(\lambda_i(D_0)\) are distinct, they remain distinct in a sufficiently small neighborhood.
(3) Lemma.
Let \(A \in \IR^{n \times n}\) have \(n\) distinct real eigenvalues.
Then \(\chi_A(A) = 0\).
Proof.
If \(A\) has \(n\) distinct real eigenvalues \(\lambda_1,\dots,\lambda_n\) and corresponding eigenvectors \(q_1,\dots,q_n\), then \(Q = [q_1,\dots,q_n]\) is invertible and \(Q^{-1} A Q = D\), where \(D = \text{diag}(\lambda_1,\dots,\lambda_n)\).
The characteristic polynomial satisfies \(\chi_A(\lambda) = \chi_D(\lambda) = \prod_{i=1}^n (\lambda - \lambda_i)\), and hence
\[
\chi_A(A)
= Q \,\chi_D(D)\, Q^{-1}
= Q \cdot 0 \cdot Q^{-1}
= 0.
\]
Comments