\documentclass{llncs}
\usepackage{amssymb,latexsym}
\def\noto{\mathrel{\#}}
\def\NN{{\mathbb N}}
\def\QQ{{\mathbb Q}}
\def\RR{{\mathbb R}}
\def\CC{{\mathbb C}}
\def\qed{\hfill$\Box$}
\begin{document}
\title{A Constructive Proof of the Fundamental Theorem of Algebra
without using the Rationals}
\author{Herman Geuvers, Freek Wiedijk, Jan Zwanenburg\\
\{{\tt herman},{\tt freek},{\tt janz}\}{\tt @cs.kun.nl}}
\institute{Department of Computer Science, University of Nijmegen, the
Netherlands}
\maketitle
\begin{abstract}\noindent
In the FTA project in Nijmegen we have formalized a constructive proof
of the Fundamental Theorem of Algebra. In the formalization, we have
first defined the (constructive) algebraic hierarchy of groups, rings,
fields, etcetera. For the reals we have then defined the notion of
{\em real number structure}, which is basically a Cauchy complete
Archimedean ordered field. This boils down to axiomatizing the
constructive reals. The proof of FTA is then given from these axioms
(so independent of a specific construction of the reals), where the
complex numbers are defined as pairs of real numbers.
The proof of FTA that we have chosen to formalize is the one in the
seminal book by Troelstra and van Dalen \cite{TvD881}, originally due
to Manfred Kneser \cite{Knes81}.
The proof by Troelstra and van Dalen makes heavy use of the rational
numbers (as suitable approximations of reals), which is quite common
in constructive analysis, because equality on the rationals is
decidable and equality on the reals isn't. In our case, this is not so
convenient, because the axiomatization of the reals doesn't `contain'
the rationals. Moreover, we found it rather unnatural to let a proof
about the reals be mainly dealing with rationals. Therefore, our
version of the FTA proof doesn't refer to the rational numbers. The
proof described here is a faithful presentation of a fully formalized
proof in the Coq system.
\end{abstract}
\section{Introduction}
\label{sec:intro}
The Fundamental Theorem of Algebra states that the field of complex numbers
is algebraically closed.
More explicitly, it says that
\begin{quote}\em
For every non-constant polynomial
$$f(z) = a_n z^n + a_{n-1} z^{n-1} + \ldots + a_1 z + a_0$$
with coefficients in $\CC$, the equation
$f(z) = 0$
has a solution.
\end{quote}
This theorem has a long and illustrious history (see \cite{FR971} or
\cite{Knes40} for an overview). It was proved for the first time in
Gauss's Ph.D.~thesis from 1799. Many proofs of the Fundamental Theorem
of Algebra are known, most of which have a constructive version.
The proof that we're presenting here was invented by Manfred Kneser
\cite{Knes81} (inspired by a proof of his father, Hellmuth Kneser, in
\cite{Knes40}), and is a constructive version of the simple proof that
derives a contradiction from the assumption that the (non-constant)
polynomial $f$ is minimal at $z_0$ with $|f(z_0)| \neq 0$. We briefly
repeat the classical proof here. Let $f(z) = a_n z^n + a_{n-1}
z^{n-1} + \ldots + a_1 z + a_0$ be a non-constant polynomial.
First note that $|f(z)|$ must have a minimum somewhere, because
$|f(z)|\rightarrow \infty$ if $|z|\rightarrow \infty$. We may assume
the minimum to be reached for $z=0$. (If the minimum is reached for
$z_0$, consider the polynomial $g(z) = f(z+z_0)$.)
Now, assume the minimum of $|f(z)|$ is not $0$ (i.e.\ $f(0) \neq 0$). The function $f(z)$ has the form
$$f(z) = a_0 + a_k z^k + O\left(z^{k+1}\right)$$
with $a_k\ne 0$.
Because of this, $f(0)=a_0\ne 0$ and we can take
$$z = \epsilon \root k\of{-{a_0\over a_k}}$$
with $\epsilon\in\RR_{> 0}$, and if $\epsilon$ is small enough, the
part $O\left(z^{k+1}\right)$ will be negligible compared to the rest,
and we get a $z\ne 0$ for which
\begin{eqnarray*}
|f(z)| &=& a_0 + a_k (\epsilon \root k\of{-{a_0\over a_k}})^k\\
&=& a_0(1-\epsilon^k)\\
&<& |f(0)|
\end{eqnarray*}
So $|f(0)|$ is not the minimum and we have derived a contradiction.
By iterating this idea, one can try to construct a Cauchy sequence to
a zero of the polynomial. The main difficulty with this approach is
that we have two conflicting requirements for the choice of
$\epsilon$:
\begin{itemize}
\item if $\epsilon$ is chosen too small each time, we may not reach
the zero in countably many steps (we will go down, but might not go
down all the way to zero).
\item if $\epsilon$ is not small enough, we are not allowed to ignore
the $O\left(z^{k+1}\right)$ part.
\end{itemize}
The solution to this is that, instead of using the above
representation (in which the term $a_k z^k$ is the {\em
smallest\/} power with a non-zero coefficient), in the constructive
proof one just takes {\em some\/} appropriate $k$ (not necessarily the
smallest) and writes $f(z)$ as
$$f(z) = a_0 + a_k (z - z_0)^k + \mbox{\it the other terms\/}$$
That way one can make sure that not only $|f(z)| < |f(0)|$,
but in fact $|f(z)| < q\left|f(0)\right|$ for some fixed $q < 1$.
The FTA proof along these lines presented by Manfred Kneser in
\cite{Knes81} is classical (`to improve readability'), but it is
stated that it can be made constructive without any serious
problems. In \cite{TvD881}, a constructive version of this proof is
given, using rational approximations to overcome the undecidability of
equality on the reals. Another constructive version of the Kneser
proof is presented by Schwichtenberg in \cite{Schw}, also using
rational approximations, but along different lines. The constructive
version of FTA reads as follows.
\begin{quote}\em
For every polynomial
$$f(z) = a_n z^n + a_{n-1} z^{n-1} + \ldots + a_1 z + a_0$$ with
coefficients in $\CC$, such that $a_k \noto 0$ for some $k>0$, the
equation $f(z) = 0$ has a solution.
\end{quote}
As the equality on $\RR$ (and therefore on $\CC$) is not decidable (we
don't have $\forall x,y\in \RR(x = y \vee x \neq y)$) we can't just
write $f(z)$ as $a_n z^n + a_{n-1} z^{n-1} + \ldots + a_1 z + a_0$
with $a_n \neq 0$. Therefore, in constructive analysis, one works with
the notion of apartness, usually denoted by $\noto$, which is a
`positive inequality': $a\noto b$ if we positively know them to be
distinct, i.e.\ we know a distance $\epsilon$ between them. Now, one
can constructively find a root of $f$ if we positively know some
coefficient $a_k$ ($k>0$) to be distinct from $0$. The proof of
constructive FTA proceeds by first proving it for {\em monic\/}
polynomials (i.e.\ where $a_n = 1$).
The original Kneser proof of FTA for monic polynomials makes use of an
approximation of the polynomial with coefficients in $\QQ$, because it
needs to compare the size of various expressions (which is not
decidable in $\RR$). We found this unsatisfactory: the rational
numbers don't seem to have anything to do with the Fundamental Theorem
of Algebra! Also, in our Coq formalization of Kneser's proof, we
introduced the real numbers axiomatically (so {\em a priori\/} we
didn't have $\QQ$ in our formalization), and it seemed silly to
reconstruct the rational numbers inside our real numbers just to be
able to formalize this proof. Therefore, instead of constructing
$\QQ$, we modified the proof so that it no longer referred to the
rationals. The result is presented here.
The main idea behind the modification of the proof is that we
introduce `fuzziness' in the comparisons. The proof will contains a
`fuzziness parameter' $\epsilon$
Instead of having to
decide whether
$$x < y \vee x = y \vee x > y,$$
all we need to establish is whether
$$x < y + \epsilon \vee x > y - \epsilon$$
(which we might write as
$$x \le_\epsilon y \vee x \ge_\epsilon y$$
using a relation $\le_\epsilon$).
Constructively we have {\em cotransitivity\/}
of the order relation
$$x < y \Rightarrow x < z \vee z < y$$
from which it follows that the disjunction with the $\epsilon$'s is decidable.
Apart from not needing $\QQ$, another difference between the proof
presented here and the proof in \cite{TvD881} is that we have avoided
using Vandermonde determinants. In the original proof, this is used to
prove FTA from FTA for monic polynomials. We prove this implication
directly, using some polynomial arithmetic. Therefore there's no use
of linear algebra in the proof anymore.
We have formalized the proof presented here using the Coq system: this
was known as the FTA project \cite{fta}.
In the formalization, we treat the real numbers axiomatically. More
precisely, the reals form a part of a {\em constructive algebraic
hierarchy}, which consists (among other things) of the abstract
notions of rings, fields and ordered fields. See \cite{alghier} for
details. The base level of this hierarchy consists of the notion of
{\em constructive setoid}, which is basically a pair of a type and an
{\em apartness\/} relation over the type. (For constructive reals,
`being apart' is more basic then `being equal', so we start from
apartness.) In this hierarchy, a {\em real number structure\/} is
defined as a Cauchy complete Archimedean ordered field. In the FTA
project, the Fundamental Theorem of Algebra was proven for any real
number structure, so as a matter of fact the theorem was proven from
the axioms for the constructive reals. Also it was shown that real
number structures exist by actually constructing one. Details on this
construction can be found in \cite{geuniq01}, where also other
axiomatizations are discussed and it is shown that any two real number
structures are isomorphic.
The whole formalization turned out to be 930K of Coq source code,
which includes the construction of the real numbers by Milad Niqui,
see \cite{geuniq01}. The parts that directly correspond to the
mathematics in this paper is about 65K of Coq source. The final lemma
that was proved in the formalization was, in Coq syntax:
\begin{center}
\verb/(f:(cpoly_cring CC))(nonConst ? f) -> (EX z | f!z [=] Zero)/
\end{center}
\noindent
The plan of the paper is as follows: for an overview we first present
the root-finding algorithm that's implicit in Kneser's proof (for
simplicity we give the classical version of that algorithm). After
that we give the full constructive Kneser proof, which contains a
correctness proof of the algorithm.
\section{The Kneser Algorithm, Classically}
Let
$$f(z) = z^n + a_{n-1} z^{n-1} + \ldots + a_1 z + a_0$$
be a monic polynomial
over the complex numbers of degree $n > 0$.
Let be given an arbitrary complex number $z_0$.
We are going to describe an algorithm that computes a Cauchy sequence
$$z_0, z_1, z_2, \ldots$$
that converges to a zero of this polynomial.
Suppose that $z_i$ has already been established.
From this we have to determine the next term in the sequence, $z_{i+1}$.
There are two possibilities:
\begin{itemize}
\item In the case that $f(z_i) = 0$ we already are at a zero, and so we will take $z_{i+1} = z_i$.
\item In the case that $f(z_i)\ne 0$ we consider the polynomial
$f_{z_i}$, defined by\footnote{The shift from $f$ to $f_{z_i}$
corresponds to the step in the classical FTA proof (see Section
\ref{sec:intro}) where the polynomial is shifted so that the alleged
minimum is reached in $0$} $f_{z_i}(z)\equiv f(z + z_i)$, find an
appropriate offset $\delta_i$ and then take $z_{i+1} = z_i +
\delta_i$.
\end{itemize}
%
So in the second case we have the polynomial
$$f_{z_i}(z) = b_n z^n + b_{n-1} z^{n-1} + \ldots + b_1 z + b_0$$
(the coefficients $b_k$ really depend on $z_i$, but
we won't write this dependency to keep the formulas simple),
with $b_n = 1$ and $b_0 = f_{z_i}(0) = f(z_i)\ne 0$, and we have
to determine $\delta_i$.
First, we will determine $|\delta_i|$.
Define
$$r_0 = \min_{k\in\{1,\ldots,n\},b_k\ne 0} \root k\of{|b_0|/|b_k|}$$
and from this define a sequence of radii
$$r_0 > r_1 > r_2 > \ldots$$
by
$$r_j = 3^{-j} r_0$$
(so every radius is $1\over 3$ of the previous one).
Now for each $j$ let $k_j$ be the element of $\{1,\ldots,n\}$ such that
$$\left|b_{k_j}\right| {r_j}^{k_j}$$ is maximal (if there are more
elements of $\{1,\ldots,n\}$ for which the maximum is attained, then
take the least one). This will give a decreasing
sequence\footnote{That this sequence is decreasing is seen by the
following argument: if $\left|b_{k_j}\right| {r_j}^{k_j}$ is the
maximum among $\{\left|b_{1}\right| {r_j},\ldots,
\left|b_{n}\right| {r_j}^{n}\}$, then $\left|b_{k_j}\right|
{r_{j+1}}^{k_j}> \left|b_{i}\right| {r_{j+1}}^{i}$ for all $i>k_j$,
because $\left|b_{i}\right| {r_{j+1}}^{i} =
\frac{1}{3^i}\left|b_{i}\right| {r_{j}}^{i}\leq
\frac{1}{3^i}\left|b_{k_j}\right| {r_{j}}^{k_j} <
\frac{1}{3^{k_j}}\left|b_{k_j}\right| {r_{j}}^{k_j}=
\left|b_{k_j}\right| {r_{j+1}}^{k_j}$.}
$$k_0\ge k_1\ge k_2\ge\ldots$$
Take the least $j > 0$ for which
$$k_{j-1} = k_j = k_{j+1}$$
and let $r = r_j$ and $k = k_j$. We will define $\delta_i$ such that
$|\delta_i| = r$, and such that $b_k {\delta_i}^k$ points opposite to
$b_0$ in the complex plane. This means that we take
$$\delta_i = r\,\root k\of{-{b_0/b_k\over\left|b_0/b_k\right|}}$$
%
and $z_{i+1} = z_i + \delta_i$. This concludes the description of the
classical version of the Kneser algorithm.
Note that this last step introduces ambiguity, because there are
$k$ different complex roots.
So the sequence
$$z_0, z_1, z_2, \ldots$$
really is a path in an infinite tree which this algorithm computes.
Of course, following different paths in this tree one might find
different zeroes.
The correctness of the algorithm is a consequence of the following
properties of the choice for $\delta_i$ (and $r$). (These properties
and the correctness will be proved in detail in the next Section.)
\begin{eqnarray*}
\left| f_{z_i}(\delta_i) \right| &<& q \left| f_{z_i}(0)\right|\mbox{
for some fixed }q<1,\\
r^n &<& \left|f_{z_i}(0)\right|.
\end{eqnarray*}
The first inequality says that $\left| f(z_{i+1}) \right| < q \left|
f(z_i)\right|$, so the $f$-values of the sequence $z_0, z_1 ,z_2,
\ldots$ converge to $0$. The second inequality says that $\left|
z_{i+1}-z_i\right|^n = \left|\delta_i\right|^n = r^n<
\left|f(z_i)\right|$, so the sequence $z_0, z_1 ,z_2,\ldots$
converges.
\section{The Kneser Proof, Constructively}
We will now present our variation on Kneser's proof of the Fundamental
Theorem of Algebra. This variant of the proof doesn't make use of
$\QQ$, unlike the proof from \cite{TvD881} that it was based on. In
the proof we have isolated the parts that are about the reals and the
parts that really need the complex numbers.
The only essential property of the complex numbers that is used is the
existence of $k$-th roots, which can be proved independently of
FTA. The most well-known proof of this fact proceeds by first moving
to a polar coordinate representation of $\CC$. As we have chosen
$\RR^2$ as a representation of $\CC$, this is not an easy proof to
formalize. (One would first have to define the arctan function and
establish an isomorphism between the two representations.) Therefore
we have chosen a different proof, which appears e.g.\ in \cite{ebb91},
and \cite{Litt41} and is basically constructive. Here, the existence
of $k$-th roots in $\CC$ is derived directly from the existence of
square roots in $\CC$ and the fact that all polynomials over $\RR$ of
odd degree have a root. The proof of these properties have all been
completely formalized in Coq. Note here that the intermediate value
theorem (which implies directly that all polynomials over $\RR$ of odd
degree have a root) is not valid constructively. However, the
intermediate value theorem can be proved for polynomials. (In our
formalization we have followed the proof of \cite{TvD881}, using Lemma
\ref{lemma:appap} for a substantial shortcut.)
The proof of FTA goes through three lemmas, which in the Coq
formalization have been called `the Key Lemma', `the Main Lemma' and
`the Kneser Lemma'. The presentation that we give here directly
corresponds to the way it was formalized in Coq.
We first state an auxiliary lemma, that says that constructively it's
possible to find the maximum of a sequence of numbers `up to
$\epsilon$':
\begin{lemma}\label{lemma:max}
For $n > 0$, $\epsilon > 0$ and $c_1,\ldots,c_n\in\RR$, there is a $k$
such that for all $i\in\{1,\ldots,n\}$:
$$c_k > c_i - \epsilon$$
\end{lemma}
%
The proof is a straightforward induction using the cotransitivity of
the $<$ relation: to determine the `maximum up to $\epsilon$' of $c_1,
\ldots, c_{n+1}$, first determine (induction hypothesis) the `maximum
up to $\epsilon/2$' of $c_1, \ldots, c_{n}$, say $c_k$ and then choose
$c_k$ if $c_{n+1}< c_k +\epsilon$ and $c_{n+1}$ if $c_{n+1}> c_k
-\epsilon/2$. The latter choice can be made because of cotransitivity
of $<$.
We now state the Key Lemma:
\begin{lemma}[Key Lemma]\label{lemma:key}
For every $n > 0$, $\epsilon > 0$ and
$a_0 > \epsilon$, $a_1\ldots,a_{n-1}\ge 0$,
$a_n = 1$,
there exist $r_0 > 0$ and $k_j\in\{1,\ldots,n\}$ with
$k_0\ge k_1\ge k_2\ge \ldots$
such that
$$a_{k_0} {r_0}^{k_0} = a_0-\epsilon$$
and for all $j\in\NN$, if we define $r_j = 3^{-j} r_0$,
for all $i\in\{1,\ldots,n\}$ it holds that
$$a_{k_j} {r_j}^{k_j} > a_i {r_j}^i-\epsilon$$
\end{lemma}
%
This lemma corresponds directly to the part of the algorithm from the
previous section that establishes $r_0$ and the sequence $k_0\ge
k_1\ge k_2\ge\ldots$ (what is called $|b_i|$ there, is called $a_i$
here, because that way the Key Lemma doesn't need to refer to complex
numbers). The choice for $r_0$ in the classical situation as $r_0 =
\min_{k\in\{1,\ldots,n\},b_k\ne 0} \root k\of{|b_0|/|b_k|}$ is
is here represented by choosing $r_0$ such
that $|b_0| = \max_{k\in \{ 1, \ldots, n\}} |b_k| r_0^k$.
The real difference with the classical
situation is that `taking the maximum' during the selection of the
$k_j$ is just `up to $\epsilon$': a term $a_i {r_j}^i$ different from
$a_{k_j} {r_j}^{k_j}$ may actually be the biggest, but it may not
exceed the selected one by more than $\epsilon$.
We will now prove the Key Lemma:
\begin{proof}
We first select $k_0$ and $r_0$. This is done by taking initial values
for $k_0$ and $r_0$ and then considering in turn for $i$ the values
$n-1$ down to $1$, preserving the following invariant:
\begin{eqnarray*}
a_{k_0} {r_0}^{k_0} &=& a_0-\epsilon,\\
a_{k_0} {r_0}^{k_0} &>& a_l {r_0}^l-\epsilon\mbox{ for all
}l\in\{i,\ldots,n\}.
\end{eqnarray*}
Start with the initial values $k_0 = n$ and $r_0 = \root n\of{a_0 -
\epsilon}$. Then, at each $i$ (from $n-1$ down to $1$) we update the
values of $k_0$ and $r_0$ as follows.
\begin{itemize}
\item If $a_i r_0^i < a_0$, do nothing. The invariant trivially
remains to hold.
\item If $a_i r_0^i > a_0 - \epsilon$, set $k_0$ to $i$ and $r_0$ to
$\root i\of{(a_0 - \epsilon)/a_i}$ (in which case $r_0$ will
decrease). The first part of the invariant trivially remains to
hold. For the second part: $a_{k_0} {r_0}^{k_0} = a_0-\epsilon$, which
is larger than each of the $a_l {r_0}^l -\epsilon$ (by the invariant
for the previous choice of $i$ and the fact that $r_0$ has decreased).
\end{itemize}
After this, $k_0$ and $r_0$ have the appropriate values.
To get $k_{j+1}$ from $k_j$, let $k = k_j$, $r = 3^{-j} r_0$
and apply Lemma \ref{lemma:max} with $\epsilon/2$ to the sequence
$$a_1 (r/3),a_2 (r/3)^2,\ldots,a_k (r/3)^k$$ to get $k' =
k_{j+1}$. (So $k_j \geq k_{j+1}$.) Then for $i\le k$ the inequality
$a_{k'} (r/3)^{k'}> a_i (r/3)^i -\epsilon$ follows directly, while
for $i > k$ we have:
$$a_k (r/3)^k = 3^{-k} a_k r^k > 3^{-k}\left(a_i r^i - \epsilon\right)
= 3^{-k} a_i r^i - 3^{-k}\epsilon > a_i (r/3)^i - \epsilon/2$$
and so:
$$a_{k'} (r/3)^{k'} > a_k (r/3)^k - \epsilon/2 > a_i (r/3)^i - \epsilon$$
\qed
\end{proof}
\noindent
We will now state and prove the Main Lemma, which isolates the part of
the proof that's about the real numbers from the part that involves the
complex numbers.
\begin{lemma}[Main Lemma]\label{lemma:main}
For every $n > 0$, $\epsilon > 0$,
$a_0 > \epsilon$, $a_1\ldots,a_{n-1}\ge 0$,
$a_n = 1$, there exists an $r > 0$ and a $k\in\{1,\ldots,n\}$
that satisfy the inequalities
$$\begin{array}{rcccl}
r^n &<& a_0&&\\
3^{-2n^2} a_0 - 2\epsilon &<& a_k r^k &<& a_0
\end{array}$$
and have the property
$$\sum_{i=1}^{k-1} a_i r^i + \sum_{i=k+1}^n a_i r^i
< (1-3^{-n}) a_k r^k + 3^n\epsilon$$
\end{lemma}
The Main Lemma corresponds to the choice for $r$ and $k$ in the
description of the classical algorithm. The first condition states
that $r$ cannot be too large in comparison to the previous value: this
corresponds to the property $r^n < \left|f_{z_i}(0)\right|$, mentioned
in the discussion of the classical algorithm. The second condition is
to make sure that, if we let $b_k \delta_i^k$ point in the opposite
direction of $b_0$, then $\left| b_0 + b_k \delta_i ^k\right|$ gets
sufficiently smaller. (The Main Lemma is about reals, but it will be
applied by taking $a_i = |b_i|$, where the $b_i$ are the coefficients of
the polynomial.) Moreover, the second and the third condition together
make sure that the sum of the remaining terms $a_i r^i$ is negligible.
We will now prove the Main Lemma.
\begin{proof}
Apply the Key Lemma to get sequences $k_0,k_1,k_2\ldots$ and
$r_0,r_1,r_2\ldots$
Because the sequence $k_j$ is non-increasing in the finite set
$\{1,\ldots,n\}$ there exists a (smallest) $j < 2n$ with
$$k_{j-1} = k_j = k_{j+1}$$
Take $k = k_j$ and $r = r_j$.
Because $r = 3^{-j} r_0 \le r_0$, for all $i$ we have $a_i r^i \le a_i
{r_0}^i < a_{k_0} {r_0}^{k_0} + \epsilon = a_0$. Of this statement
$r^n < a_0$ and $a_k r^k < a_0$ are special cases. From $a_{k_0}
r^{k_0} = 3^{-jk_0} a_{k_0} {r_0}^{k_0} \ge 3^{-jn} a_{k_0}
{r_0}^{k_0} = 3^{-jn} (a_0 - \epsilon) \geq 3^{-jn} a_0 - \epsilon$
%> 3^{-jn} a_0 - \epsilon$
it follows that $a_k r^k > a_{k_0} r^{k_0} - \epsilon
%> 3^{-jn} a_0 - 2\epsilon
\geq 3^{-jn} a_0 - 2\epsilon> 3^{-2n^2}a_0 - 2\epsilon$.
From $k = k_{j+1}$ we get that for all $i\in\{1,\ldots,n\}$
$$a_k (r/3)^k > a_i (r/3)^i - \epsilon$$
and from that it follows that
$$a_i r^i < 3^{i-k} a_k r^k + 3^i\epsilon$$
and therefore
\begin{eqnarray*}
\sum_{i=1}^{k-1} a_i r^i &<& \big(\sum_{i=1}^{k-1} 3^{i-k}\big) a_k
r^k + \big(\sum_{i=1}^{k-1} 3^i\big)\epsilon\\
&=& {1\over 2}(1-3^{1-k}) a_k r^k + {1\over 2}(3^k-3)\epsilon\\
&<& {1\over 2}(1 - 3^{-n}) a_k r^k + {1\over 2}3^n\epsilon
\end{eqnarray*}
%
In exactly the same way we get from $k = k_{j-1}$ that
$$a_i r^i < 3^{k-i} a_k r^k + 3^{-i}\epsilon$$
and so
$$\sum_{i=k+1}^n a_i r^i < {1\over 2}(1 - 3^{-n}) a_k r^k + {1\over
2}3^n\epsilon$$
%
Together this gives
$$\sum_{i=1}^{k-1} a_i r^i + \sum_{i=k+1}^n a_i r^i
< (1 - 3^{-n}) a_k r^k + 3^n\epsilon$$
\qed
\end{proof}
\noindent
We now state and prove the `Kneser Lemma'. This lemma states that we
can find what was called $\delta_i$ in the previous Section: an
appropriate vector that moves us sufficiently closer to a zero. In
the classical version of the Kneser proof, one distinguishes cases
according to $f(0)=0$ or $f(0) \neq 0$. In the first case we are done,
while in the second case one finds a $z\in\CC$ such that
$$|z|^n < |f(0)|$$
and
$$|f(z)| < q\left|f(0)\right|$$
(where $q < 1$ is some fixed multiplication factor that only depends on
the degree of the polynomial).
However,
we don't know $f(0)=0 \vee f(0) \neq 0$ constructively.
Therefore, we here have a $c > 0$ that takes the role of $|f(0)|$.
This $c$ can
get arbitrary close to $|f(0)|$ from above.
Here is the constructive version of the Kneser Lemma:
\begin{lemma}[Kneser Lemma]\label{lemma:kneser}
For every $n > 0$ there is a $q$ with $0 < q < 1$, such that
for all monic polynomials $f(z)$ of degree $n$ over the complex numbers,
and for all $c > 0$
such that
$$|f(0)| < c$$
there exists a $z\in\CC$ such that
$${|z|}^n < c$$
and
$$|f(z)| < qc$$
\end{lemma}
\begin{proof}
First of all, we give the factor $q$ explicitly:
$$q = 1 - 3^{-2n^2-n}$$
We now show how to find $z$.
Write the polynomial $f(z)$ as
$$f(z) = b_n z^n + b_{n-1} z^{n-1} + \ldots + b_1 z + b_0$$ Because
$f(z)$ is monic we have that $b_n = 1$, Also, we have that $b_0 =
f(0)$, so the condition about $c$ states that $|b_0| < c$. As $qc>0$
we can make the following case distinction
$$|f(0)| < q c \quad \vee \quad |f(0)| >0.$$
In the first case we are done by taking $z:=0$. In the second case we
proceed as follows.
Define $a_i = |b_i|$ for $i\in\{0,\ldots,n\}$ and choose an $\epsilon
> 0$ such that
\begin{eqnarray}
2\epsilon &<& 3^{-2n^2} a_0 \label{ineq1:epsilon}\\
(3^n + 1)\epsilon &<& q(c-a_0) \label{ineq2:epsilon}
\end{eqnarray}
Then $\epsilon < a_0$ and we apply the Main
Lemma (Lemma \ref{lemma:main}) to $a_0,\ldots, a_n$ to obtain $r > 0$ and
$k\in\{1,\ldots,n\}$ satisfying
$$\begin{array}{rcl}
r^n &<& a_0\\
3^{-2n^2} a_0 - 2\epsilon &<& a_k r^k < a_0\\
\sum_{i=1}^{k-1} a_i r^i + \sum_{i=k+1}^n a_i r^i
&<& (1-3^{-n}) a_k r^k + 3^n\epsilon
\end{array}$$
Finally take
$$z = r\,\root k\of{-{b_0/b_k\over a_0/a_k}}$$
(This makes use of inequality (\ref{ineq1:epsilon})
$2\epsilon < 3^{-2n^2} a_0$,
because we need to know that
$a_k > 0$.)
Then because $|b_0| = a_0$ and $|b_k| = a_k$ we have
$$|z| = r$$
From this, we get
$$|z|^n = r^n < a_0 < c$$
For the second property of $z$ we start by computing $|b_0 + b_k z^k|$:
\begin{eqnarray*}
|b_0 + b_k z^k| &=& \Big|b_0 + b_k r^k \Big({-{b_0/b_k\over a_0/a_k}}\Big)\Big|\\
&=& \big|{b_0\over a_0}(a_0 - a_k r^k)\big|\\
&=& |a_0 - a_k r^k|\\
&=& a_0 - a_k r^k
\end{eqnarray*}
(Using the inequality $a_k r^k < a_0$.)
By the triangle inequality for the complex numbers, we then get
\begin{eqnarray*}
\big|\sum_{i=0}^n b_i z^i\big| &\leq& \left|b_0 + b_k z^k\right| + \sum_{i=1}^{k-1} a_i r^i + \sum_{i=k+1}^n a_i r^i\\
&<& a_0 - a_k r^k + (1-3^{-n}) a_k r^k + 3^n\epsilon\\
&=& a_0 - 3^{-n} a_k r^k + 3^n\epsilon \\
&<& a_0 - 3^{-n} (3^{-2n^2} a_0 - 2\epsilon) + 3^n\epsilon \\
&=& (1 - 3^{-2n^2-n}) a_0 + 3^n\epsilon + 3^{-n} 2\epsilon \\
&<& q a_0 + 3^n\epsilon + \epsilon\\
&<& qc,
\end{eqnarray*}
where the final inequality follows from (\ref{ineq2:epsilon}) $(3^n +
1)\epsilon < q(c-a_0)$. \qed
\end{proof}
\noindent
Next we prove the special case of the Fundamental Theorem of Algebra
for monic polynomials:
\begin{lemma}[Fundamental Theorem of Algebra for monic polynomials]\label{lemma:ftareg}
For every monic polynomial $f(z)$ of degree $n > 0$ over the complex numbers,
there exists $z\in\CC$ such that $f(z) = 0$.
\end{lemma}
\begin{proof}
Take any $c > 0$ with $c > |f(0)|$.
We construct a sequence $z_i\in\CC$ such that for all $i$
\begin{eqnarray}
|f(z_i)| &<& q^i c\label{cauchy:1}\\
|z_{i+1}-z_i| &<& (q^i c)^{1/n}\label{cauchy:2}
\end{eqnarray}
where $q < 1$ is given by the Kneser Lemma \ref{lemma:kneser}.
This sequence is constructed
by iteratively applying the Kneser Lemma to $f_{z_i}(z)\equiv f(z+z_i)$ to find
$z_{i+1}-z_i$.
The required properties of $z_i$ then follow directly from the properties in
the Kneser Lemma, by induction on $i$.
Because of \ref{cauchy:2}, the $z_i$ form a Cauchy
sequence:
\begin{eqnarray*}
|z_{m+i}-z_m|&\le&|z_{m+i}-z_{m+i-1}|+\ldots+|z_{m+1}-z_m|\\
&<&(q^{(m+i-1)/n}+q^{(m+i-2)/n}+\ldots+q^{m/n})c^{1/n}\\
&=&{q^{m/n}-q^{(m+i)/n}\over1-q^{1/n}} c^{1/n}\\
&=&q^{m/n}{1-q^{i/n}\over1-q^{1/n}}c^{1/n}\\
&<&q^{m/n}{c^{1/n}\over1-q^{1/n}}.
\end{eqnarray*}
By choosing $m$ sufficiently large ($n$ is fixed), this last
expression can be made arbitrarily small.
Then, because $z_i$ is a Cauchy sequence, the limit $z=\lim_{i\to\infty} z_i$
exists and by continuity of $f$
one has
$$|f(z)|=\lim_{i\to\infty} |f(z_i)|\le\lim_{i\to\infty} q^i c=0$$
so
$f(z)=0$.
\qed
\end{proof}
\noindent
Finally we prove the full Fundamental Theorem of Algebra. A
polynomial is called {\em non-constant\/} if for some $k > 0$ one of
its coefficients $a_k$ is apart from zero. We denote this by $f\noto
0$. This $a_k$ doesn't necessarily need to be the head coefficient
$a_n$ of the polynomial. In fact the head coefficient $a_n$ might be
zero (we can't know this), so proving the full Fundamental Theorem of
Algebra is not as easy as just dividing by $a_n$ and then applying
Lemma \ref{lemma:ftareg}.
We need one more important property stating, in a sense, the opposite
of the Fundamental Theorem of Algebra: instead of showing that there
is an argument for which the polynomial is zero, it shows that there
is an argument for which the polynomial is {\em apart\/} from
zero. This fact comes as an immediate corollary of the following
lemma.
\begin{lemma}\label{lemma:appap}
Given a polynomial $f$ of degree at most $n$ and $n+1$ distinct
points $z_0,z_1,\ldots,z_n\in\CC$, $f(z_i)\noto 0$ for at least one of
the $z_i$.
\end{lemma}
\begin{proof}
Write $f(z)$ in the form
$$f(z) = a_n z^n + a_{n-1} z^{n-1} + \ldots + a_1 z + a_0$$
which means that $f(z)$ has at most degree $n$.
Then for any $n+1$ different $z_0,z_1,\ldots,z_n\in\CC$
one can write
$$f(z) = \sum_{i=0}^n
f(z_i){(z-z_0)\cdots(z-z_{i-1})(z-z_{i+1})\cdots(z-z_n)\over
(z_i-z_0)\cdots(z_i-z_{i-1})(z_i-z_{i+1})\cdots(z_i-z_n)}$$ because
both sides have at most degree $n$, and coincide on $n+1$ points (and
hence they are equal). This means that we can write $f(z)$ in the
form
$$f(z) = \sum_{i=0}^n f(z_i) f_i(z)$$
for some $n+1$ polynomials $f_i$.
Because this sum is $\noto 0$, there is some $i\in\{0,\ldots,n\}$
for which the polynomial
$f(z_i) f_i\noto 0$
and therefore for this $i$ we have that $f(z_i)\noto 0$.
\qed
\end{proof}
\begin{corollary}\label{cor:appap}
For every polynomial $f\noto 0$ over the complex numbers,
there exists $z\in\CC$ such that $f(z)\noto 0$.
\end{corollary}
\begin{theorem}[Fundamental Theorem of Algebra]
For every non-constant polynomial $f(z)$ over the complex numbers,
there exists $z\in\CC$ such that $f(z) = 0$.
\end{theorem}
\begin{proof}
We write
$$f(z) = a_n z^n + a_{n-1} z^{n-1} + \ldots + a_1 z + a_0$$
Because $a_n$ might be zero, we call $n$ the {\em length\/} of $f$
instead of calling it the degree of $f$.
We'll prove the theorem with induction on this length $n$.
With Corollary \ref{cor:appap} find a $z_0\in\CC$ such that $f(z_0)\noto 0$.
Then if we define
$f_{z_0}(z)\equiv f(z+z_0)$, it is sufficient to find a zero
of $f_{z_0}$, because if $z$ is a zero of $f_{z_0}$ then $z+z_0$
will be a zero of $f$.
So all we need to prove is that $f_{z_0}$ has a zero.
We write
$$f_{z_0}(z) = b_n z^n + b_{n-1} z^{n-1} + \ldots + b_1 z + b_0$$ with
$b_0 = f_{z_0}(0) = f(z_0)\noto 0$. We define the {\em reverse\/}
${f_{z_0}}^{\it rev}(z)$ of this polynomial to be the polynomial
$$b_0 z^n + b_1 z^{n-1} + \ldots + b_{n-1} z + b_n$$
so with the coefficients in the opposite order.
This reverse operation has the property that the reversal of a product
is the product of the reversals: $(gh)^{\it rev} = g^{\it rev} h^{\it rev}$.
Now
${f_{z_0}}^{\it rev}(z)/b_0$
is monic, so by Lemma \ref{lemma:ftareg}
it has a zero $c$, and so it can be written as $(z-c) g(z)$.
Because, as we noted, reversals commute with products, this implies that the original $f_{z_0}$ can
be written as
$$f_{z_0}(z) = (c_1 z + c_0) h(z)$$
where $h(z)$ is a lower length polynomial of the form
$$h(z) = d_{n-1} z^{n-1} + \ldots + d_1 z + d_0$$
%
Because $f_{z_0}$ is non-constant, we have $b_i\noto 0$ for some $i > 0$.
And because
$$b_i = c_0 d_i + c_1 d_{i-1}$$
we find that either $c_0 d_i\noto 0$ or $c_1 d_{i-1}\noto 0$.
\begin{itemize}
\item In the case that $c_0 d_i\noto 0$, we get $d_i\noto 0$ and therefore
$h(z)$ is non-constant, has a zero by induction, and this zero
will also be a zero of $f_{z_0}$.
\item In the case that $c_1 d_{i-1}\noto 0$, we get $c_1\noto 0$ and then
$-c_0/c_1$
will be a zero of $f_{z_0}$.
\end{itemize}
\qed
\end{proof}
\section{Convergence Speed of the Kneser Algorithm}
The Kneser proof (and the algorithm that is implicit in it) as
presented in this paper differs from the Kneser proof from
\cite{TvD881} in an important respect. In this paper we define the
sequence
$$r_0 > r_1 > r_2 > \ldots$$
(and the matching sequence
$k_0 \ge k_1 \ge k_2 \ge \ldots$)
to start at zero.
In \cite{TvD881} the $r_j$ sequence starts at minus one
$$r_{-1} > r_0 > r_1 > r_2 > \ldots$$
Each $r_j$ is three times as small as the previous one, so in the
other variant of the proof the search for an appropriate $r$ starts at
a radius that is three times as big as the radius $r_0$. To
distinguish the two proofs we'll call the proof that is in this paper
the {\em slow\/} variant of the proof and the one where the sequences
$r_j$ and $k_j$ start at $-1$ the {\em fast\/} variant of the proof.
In Coq we formalized the slow variant of the proof. It is a simpler
proof and we wanted to finish the formalization as fast as possible.
Also in Coq it's easier to formalize a sequence starting at $0$ than a
sequence starting at $-1$. (One could shift the sequence by one but
that would complicate the formulas in various places.)
The fast variant of the proof has the advantage that the corresponding
algorithm behaves like Newton-Raphson when the algorithm gets close to
the zero of the polynomial. The algorithm from the slow variant of
the proof converges slower, because close to a zero it only takes one
third of a Newton-Raphson step. In the slow variant of the proof,
close to a zero we get
$k_0 = k_1 = k_2 = \ldots = 1$,
which means that $j = 1$ and so
$r = r_1 = {1\over 3}r_0$,
where $r_0$ is the Newton-Raphson distance. Note that close to the
zero, the value of the polynomial will then be multiplied with
approximately a factor of $2/3$ at each step, which is much better
than the `worst case' factor of
$q = 1-3^{-2n^2-n}$
which appears in the proof.
As an example of the behavior of the algorithm from the slow variant of the proof
we calculate $\sqrt{2}\approx 1.41421$ by
finding a root of $z^2-2$, starting from $z_0 = 1$:
\begin{small}
$$\begin{array}{lcl}
z_0 =&1&= 1\\
z_1 =&7/6&\approx 1.16667\\
z_2 =&317/252&\approx 1.25794\\
z_3 =&629453/479304&\approx 1.31326\\
z_4 =&2440520044877/1810196044272&\approx 1.34821\\
z_5 =&\ldots&\approx 1.37075\\
z_6 =&\ldots&\approx 1.38547
\end{array}$$%
\end{small}%
In this sequence the Kneser algorithm takes
$\log(1/10)/\log(2/3)\approx 5.7$ steps to gain one decimal of
precision.
In the fast variant of the proof we get, close to a zero
$k_{-1} = k_0 = k_1 = k_2 = \ldots = 1$,
which means that then $j = 0$ and so $r = r_0$.
In the case of $\sqrt{2}$ this gives
\begin{small}
$$\begin{array}{lcl}
z_0 =&1&= 1\\
z_1 =&3/2&\approx 1.5\\
z_2 =&17/12&\approx 1.4166666666666666666666667\\
z_3 =&577/408&\approx 1.4142156862745098039215686\\
z_4 =&665857/470832&\approx 1.4142135623746899106262956\\
z_5 =&886731088897/627013566048&\approx 1.4142135623730950488016896\\
z_6 =&\ldots&\approx 1.4142135623730950488016887
\end{array}$$%
\end{small}%
This is the same sequence that the Newton-Raphson algorithm
calculates. This particular sequence consists of continued fraction
approximations of $\sqrt{2}$ and the correct number of decimals
doubles with every step. Note that the Kneser algorithm of the fast
variant of the proof only coincides with Newton-Raphson close to the
zero. With Newton-Raphson not all start values lead to a convergent
sequence, but with the Kneser algorithm it does.
To change the proof in this paper to the fast variant (where the
sequences $r_j$ and $k_j$ start at -1), only the proof of the Key
Lemma needs to be modified. Apart from going from $k_j$ to $k_{j+1}$
we will also need to go from $k_0$ to $k_{-1}$. To be able to do
that, the $k_0$ and $r_0$ will need to satisfy a stricter restriction
than before. It needs to satisfy
$$a_{k_0} r_0^{k_0} > a_i r_0^i - \epsilon'$$
where
$$\epsilon' = 3^{-n}\epsilon$$
To find such $k_0$ and $r_0$ one proceeds like before,
but this time distinguishing between
$$a_i r_0^i < a_0 - \epsilon + \epsilon'$$
and
$$a_i r_0^i > a_0 - \epsilon$$
at every iteration.
Then to get $k_{-1}$ from $k_0$ one applies Lemma \ref{lemma:max}
to the sequence
$$a_{k_0} (3r_0)^{k_0},\ldots,a_n (3r_0)^n$$
with a reasoning similar to the $k_{j+1}$ from $k_j$ case.
\section{Brief overview of other constructive proofs}
The first constructive proof of FTA (for monic polynomials) is from
Weyl \cite{Weyl24}, where the winding number is used to simultaneously
find all zeros of a (monic) polynomial. A similar but more abstract
proof, also using the winding number, occurs in \cite{bishop.85},
where FTA is proved for arbitrary non-constant polynomials. Based on
Weyl's approach, \cite{HeGa67} presents an implementation of an
algorithm for the simultaneous determination of the zeros of a
polynomial.
In \cite{BrLo24}, Brouwer and De Loor give a constructive proof of FTA
for monic polynomials by first proving it for polynomials with
rational complex coefficients (which have the advantage that equality
is decidable) and then make the transition (viewing a complex number
as the limit of a series of rational complex numbers) to general monic
polynomials over $\CC$. This proof -- and also Weyl's and other FTA
proofs -- are discussed and compared in \cite{Loor25}.
Brouwer \cite{Brou24} was the first to generalize the constructive FTA
proof to arbitrary non-constant polynomials (where we just know {\em
some\/} coefficient to be apart from $0$). In \cite{Spec67} it is
shown that, for general non-constant polynomials, there is a
continuous map from the coefficients to the set of zeros.
\paragraph{Acknowledgements}
We thank the referees for their very valuable comments, which led us
to improve part of the proof. We thank Henk Barendregt, Randy Pollack
for inspiring discussions and their valuable comments. Thanks to
Helmut Schwichtenberg for the enlightening and stimulating discussions
on the FTA proof of Kneser and for providing us with a copy of
\cite{Schw}. We thank Bas Spitters and Wim Veldman for various
discussions on the constructive aspects of analysis.
\begin{thebibliography}{0}
\bibitem{bishop.85}
E.~Bishop and D.~Bridges,
{\em Constructive Analysis},
Number 279 in Grundlehren der mathematischen Wissenschaften.
Springer, 1985.
\bibitem{BrLo24}
L.E.J.\ Brouwer and B.\ de Loor,
Intuitionistischer Beweis des Fundamentalsatzes der Algebra,
in {\em Proceedings of the KNAW}, 27, pp.\ 186--188, 1924.
\bibitem{Brou24}
L.E.J.\ Brouwer,
Intuitionistische Erg\"anzung des Fundamentalsatzes der Algebra,
in {\em Proceedings of the KNAW}, 27, pp.\ 631--634, 1924.
\bibitem{DeHe67}
B.\ Dejon and P.\ Henrici, Editors,
{\em Constructive Aspects of the Fundamental Theorem of Algebra},
Proceedings of a symposium at IBM Research Lab, Z\"urich-R\"uschlikon,
June 5-7, 1967, Wiley-Interscience, London.
\bibitem{ebb91}
H.-D.~Ebbinghaus et al. (eds.),
{\em Numbers},
Springer, 1991, 395 pp.
\bibitem{FR971}
B.~Fine and G.~Rosenberger, {\em The Fundamental Theorem of Algebra},
Undergraduate Texts in Mathematics,
Springer, 1997, xii+208 pp.
\bibitem{fta}
H.~Geuvers, F.~Wiedijk, J.~Zwanenburg, R.~Pollack, M.~Niqui, H.~Barendregt,
FTA project, \url{http://www.cs.kun.nl/gi/projects/fta/}.
\bibitem{alghier}
H.~Geuvers, R.~Pollack, F.~Wiedijk, and J.~Zwanenburg.
The algebraic hierarchy of the {FTA} project,
in {\em Calculemus 2001 workshop proceedings}, pp.\ 13--27, Siena, 2001.
\bibitem{geuniq01}
H.~Geuvers, M.~Niqui,
Constructive Reals in Coq: Axioms and Categoricity,
{\em Types 2000 Workshop, Durham, UK}, this volume.
\bibitem{HeGa67}
P.\ Henrici and I.\ Gargantini,
Uniformly convergent algorithms for the simultaneous approximation of
all zeros of a polynomial, in \cite{DeHe67}, pp.\ 77--113.
\bibitem{Knes40}
H.\ Kneser,
Der Fundamentalsatz der Algebra und der Intuitionismus, {\em Math.\
Zeitschrift}, 46, 1940, pp.\ 287--302.
\bibitem{Knes81}
M.\ Kneser,
Erg\"anzung zu einer Arbeit von Hellmuth Kneser \"uber den
Fundamentalsatz der Algebra, {\em Math.\ Zeitschrift}, 177, 1981, pp.\
285--287.
\bibitem{Litt41}
J.E.\ Littlewood,
Every polynomial has a root, {\em Journal of the London Math.\ Soc.\/} 16,
1941, pp.\ 95--98.
\bibitem{Loor25}
B.\ de Loor,
{\em Die Hoofstelling van die Algebra van Intu\"\i tionistiese
standpunt}, Ph.D.\ Thesis, Univ.\ of Amsterdam, Netherlands,
Feb.\ 1925, pp.\ 63 (South-African).
\bibitem{Schw}
Helmut Schwichtenberg,
Ein konstruktiver Beweis des Fundamentalsatzes,
Appendix A (pp.\ 91--96) of Algebra,
Lecture notes,
Mathematisches Institut der Universit{\"a}t M{\"u}nchen 1998,
\url{http://www.mathematik.uni-muenchen.de/schwicht/lectures/algebra/ws98/skript.ps}
\bibitem{Spec67}
E.\ Specker,
The Fundamental Theorem of Algebra in Recursive Analysis,
in \cite{DeHe67}, pp.\ 321--329.
\bibitem{TvD881}
A.\ Troelstra and D.\ van Dalen, {\em Constructivism in Mathematics}, vols. 121 and 123 in Studies in Logic and The
Found.\ of Math., North-Holland, 1988.
\bibitem{Weyl24}
H.\ Weyl,
Randbemerkungen zu Hauptproblemen der Mathematik,
{\em Math.\ Zeitschrift}, 20, 1924, pp.\ 131--150.
\end{thebibliography}
\end{document}