\let\rmOmega=\Omega
\let\tie=~
\let~=\bullet
\documentclass[runningheads]{llncs}
\usepackage{amssymb,url,rotating,supertabular}
%\raggedbottom
\newenvironment{systable}
{\begin{flushright}
\begin{tabular}{p{6.4cm}ccccccccccccccc}
& \hbox to .8em{\hss\begin{sideways}\strut HOL\end{sideways}\hss} &
\hbox to .8em{\hss\begin{sideways}\strut Mizar\end{sideways}\hss} &
\hbox to .8em{\hss\begin{sideways}\strut PVS\end{sideways}\hss} &
\hbox to .8em{\hss\begin{sideways}\strut Coq\end{sideways}\hss} &
\hbox to .8em{\hss\begin{sideways}\strut Otter/Ivy\end{sideways}\hss} &
\hbox to .8em{\hss\begin{sideways}\strut Isabelle/Isar\end{sideways}\hss} &
\hbox to .8em{\hss\begin{sideways}\strut Alfa/Agda\end{sideways}\hss} &
\hbox to .8em{\hss\begin{sideways}\strut ACL2\end{sideways}\hss} &
\hbox to .8em{\hss\begin{sideways}\strut PhoX\end{sideways}\hss} &
\hbox to .8em{\hss\begin{sideways}\strut IMPS\end{sideways}\hss} &
\hbox to .8em{\hss\begin{sideways}\strut Metamath\end{sideways}\hss} &
\hbox to .8em{\hss\begin{sideways}\strut Theorema\end{sideways}\hss} &
\hbox to .8em{\hss\begin{sideways}\strut Lego\end{sideways}\hss} &
\hbox to .8em{\hss\begin{sideways}\strut NuPRL\end{sideways}\hss} &
\hbox to .8em{\hss\begin{sideways}\strut $\rmOmega$mega\end{sideways}\hss} \\
\noalign{\smallskip}\cline{1-16}\noalign{\smallskip}}
{\noalign{\smallskip}\cline{1-16}\noalign{\smallskip}
\end{tabular}
\end{flushright}}
\def\systableline{\noalign{\smallskip}\cline{1-16}\noalign{\smallskip}}
\begin{document}
\title{Comparing mathematical provers}
\author{Freek Wiedijk}
\institute{University of Nijmegen}
\maketitle
\begin{abstract}
We compare fifteen systems for the formalizations of
mathematics with the computer.
We present several tables that list various properties of
these programs.
The three main dimensions on which we compare these systems
are:
the size of their library,
the strength of their logic and their level of automation.
\end{abstract}
\section{Introduction}
%\subsection{Disclaimer}
\emph{We realize that many judgments in this paper are rather subjective.
We apologize in advance to anyone whose
system is misrepresented here.
We would like to be notified by e-mail of any errors in this paper
at}
\url{}.
\subsection{Problem}
The QED manifesto \cite{qed:94} describes a future in which
all of mathematics is encoded in the computer in such a
way that the correctness can be mechanically verified.
During the years the same dream has been at the
core of various proof checking projects.
Examples are
the Automath project \cite{ned:geu:vri:94}, the
Mizar project \cite{muz:93,wie:99:1}, the NuPRL project
\cite{con:all:bro:cle:cre:har:how:kno:men:pan:sas:smi:86},
and the Theorema project \cite{buc:jeb:kri:mar:vas:97}.
Recently, the checking of mathematical proofs has become
popular in the context of verification of hardware and
software:
important systems in this area are ACL2 \cite{kau:man:moo:00} and PVS
\cite{owr:rus:sha:92}.
Because of this, the field of proof verification currently focuses on computer
science applications.
The study of formal proof in
mathematics is still not widespread.
\looseness=-1
We have compiled a list of `state of the art'
systems for the formalization of mathematics, systems that
one might seriously consider when thinking of implementing
the QED dream.
We were not so much interested in experimental systems
(systems that try out some new idea),
as well as in `industrial strength' systems
(systems that are in at least some aspects better
at the formalization of mathematics than all other existing
systems).
We ended up with a list of fifteen systems.
For each of these systems we asked a user of the
system to formalize the same small theorem:
the Pythagorean proof of the
irrationality of $\sqrt{2}$.\footnote{%
This proof is mentioned in Aristotle's \emph{Prior Analytics}, as follows
\cite{tho:39}:
\begin{quote}
{\em For all who argue \emph{per impossibile} infer by syllogism
a false conclusion, and prove the original conclusion
hypothetically when something impossible follows
from a contradictory assumption, as, for example,
that the diagonal [of a square] is incommensurable
[with the side] because odd numbers are equal to
even if it is assumed to be commensurate.
%It is
%inferred by syllogism that odd numbers are equal to
%even, and proved hypothetically that the diagonal is
%incommensurate, since a false conclusion follows from
%the contradictory assumption.
}
\end{quote}
It was interpolated in Euclid's \emph{Elements} as
Proposition \textsc{x}.$\;$117.
Due to the oral tradition of the Pythagorean School,
the origins of this proof have been covered in `complete darkness'
\cite{toe:63}.
There is a legend that
Pythagoras' pupil Hippasus discovered this proof and was drowned at sea to keep it a secret.
}
These fifteen formalizations will be published elsewhere.\footnote{%
The current draft of this document is on the
Web at \url{}
}
We did not in advance specify to the user a very specific proof problem to be
solved, because we wanted formalizations
that were the most natural for the given system.
In this paper we compare these fifteen systems according
to various criteria.
We do not try to establish which of these systems is `best':
they are too different to say something
like that.
All of these fifteen systems
clearly show the dedication of their creators,
they all
contain important ideas, and they all merit to be studied and used.
%
The main purpose of this paper is
to show how different
this kind of system can be.
When one only knows a few of these systems, it is tempting to
think that all systems for formal mathematics
have to be of a similar nature.
Instead, it is surprising how diverse these systems are.
\subsection{Approach}
This paper is primarily a collection of tables that
show various properties of the systems.
It is something like a `consumer test'.
To illustrate three of the most important
dimensions for comparing these systems (the size
of their library,
the strength of their logic, and
their level of automation), at the end of the paper we show
them together in a two-dimensional diagram.
We realize that this diagram is highly subjective.
We list some aspects of the systems that we have used to determine
the positions in this diagram.
(Readers will probably disagree with the details of this
diagram and are encouraged to make their own variant.)
The order in which we list the systems in this paper is
the order in which we received the formalizations
of the irrationality of $\sqrt{2}$.
In this way we wish to express our gratitude for the help
of all the people who wrote these formalizations.
\subsection{Related Work}
On page 1227 of \cite{bar:geu:01} there appears a comparison
similar to the one in this paper.
However, it only compares nine instead of fifteen systems,
and the comparison takes only one and a half pages.
There are several other
comparisons between provers in the literature, but those
generally compare only two systems,
and often compare the systems for applications in computer science
instead of for mathematics.
For instance, there are comparisons between
NuPRL and Nqthm \cite{bas:kau:91},
HOL and Isabelle \cite{age:gor:95},
HOL and ALF \cite{age:bey:dyb:96},
Coq and HOL \cite{jak:cou:cur:97},
and HOL and PVS \cite{gri:hui:98}.
There also has been work done on how to embed the proofs of one system
in another one.
As an example, there is the work by Doug Howe and others
on how to translate HOL proofs to classical NuPRL
\cite{nau:ste:mes:01}.
Surprisingly, mapping mathematics between systems
is more difficult than one would expect.
\subsection{Outline}
In Section \ref{systems} we list the fifteen systems that
are compared in this paper.
We explain why we selected these
systems and also why we did not select some other systems.
In Section \ref{files} we investigate various proof
representations that are used by the fifteen systems.
We give a classification of these representations into seven categories.
Then we investigate the sizes of the
irrationality of $\sqrt{2}$ proofs.
Finally we compare the sizes of the libraries of the systems.
In Section \ref{logics} we compare the logical foundations
of the systems.
We also look at their architecture according
to the so-called `de Bruijn criterion'.
In Section \ref{automation} we compare the level of automation
of the systems.
We study whether they satisfy the `Poincar\'e principle',
whether they have an open architecture that allows user automation,
and whether they come with strong built-in automation.
Finally in Section \ref{diagram} we put all systems in one
diagram.
\section{From Alfa to $\bf\rmOmega$mega: the fifteen provers of the world}\label{systems}
The systems that are compared in this paper are listed in
the following table:
%For each system we give the name of
%the system, the web page (which contains all information that
%one needs about the system), the implementation language of the system,
%the main people behind the system,
%and the name of the person that did the proof of the irrationality
%of $\sqrt{2}$ for that system.
\begin{enumerate}
\item
\textbf{HOL} \par
\emph{Web page:} \url{} \par
\emph{Implementation language:} ML \par
\emph{Main person behind the system:} Mike Gordon \par
\emph{People who did $\sqrt{2}\not\in{\mathbb Q}$:} John Harrison, Konrad Slind \par
\smallskip
\item
\textbf{Mizar} \par
\emph{Web page:} \url{} \par
\emph{Implementation language:} Pascal \par
\emph{Main person behind the system:} Andrzej Trybulec \par
\emph{Person who did $\sqrt{2}\not\in{\mathbb Q}$:} Andrzej Trybulec \par
\smallskip
\item
\textbf{PVS} \par
\emph{Web page:} \url{} \par
\emph{Implementation languages:} Lisp, ML \par
\emph{Main people behind the system:} John Rushby, Natarajan Shankar, Sam Owre \par
\emph{People who did $\sqrt{2}\not\in{\mathbb Q}$:} Bart Jacobs, John Rushby \par
\smallskip
\item
\textbf{Coq} \par
\emph{Web page:} \url{} \par
\emph{Implementation language:} ML \par
\emph{Main people behind the system:} G\'erard Huet, Thierry Coquand, Christine Paulin \par
\emph{Person who did $\sqrt{2}\not\in{\mathbb Q}$:} Laurent Th\'ery \par
\smallskip
\item
\textbf{Otter/Ivy} \par
\emph{Web pages:} \url{} and
\url{} \par
\emph{Implementation language:} C \par
\emph{Main people behind the system:} William McCune, Larry Wos, Olga Shumsky \par
\emph{People who did $\sqrt{2}\not\in{\mathbb Q}$:} Michael Beeson, William McCune \par
\smallskip
\item
\textbf{Isabelle/Isar} \par
\emph{Web pages:} \url{} and
\url {} \par
\emph{Implementation language:} ML \par
\emph{Main people behind the system:} Larry Paulson, Tobias Nipkow, Markus Wenzel \par
\emph{People who did $\sqrt{2}\not\in{\mathbb Q}$:} Markus Wenzel, Larry Paulson \par
\smallskip
\item
\textbf{Alfa/Agda} \par
\emph{Web page:} \url{} and
\url{} \par
\emph{Implementation language:} Haskell \par
\emph{Main people behind the system:} Thierry Coquand, Catarina Coquand, Tho\-mas Hallgren \par
\emph{Person who did $\sqrt{2}\not\in{\mathbb Q}$:} Thierry Coquand \par
\smallskip
\item
\textbf{ACL2} \par
\emph{Web page:} \url{} \par
\emph{Implementation language:} Lisp \par
\emph{Main person behind the system:} J Strother Moore \par
\emph{Person who did $\sqrt{2}\not\in{\mathbb Q}$:} Ruben Gamboa \par
\smallskip
\item
\textbf{PhoX} \par
\emph{Web page:} \url{} \par
\emph{Implementation language:} ML \par
\emph{Main person behind the system:} Christophe Raffalli \par
\emph{People who did $\sqrt{2}\not\in{\mathbb Q}$:} Christophe Raffalli, Paule Rozi\`ere \par
\smallskip
\item
\textbf{IMPS} \par
\emph{Web page:} \url{} \par
\emph{Implementation language:} Lisp \par
\emph{Main people behind the system:} William Farmer, Joshua Guttman, Javier Thayer \par
\emph{Person who did $\sqrt{2}\not\in{\mathbb Q}$:} William Farmer \par
\smallskip
\item
\textbf{Metamath} \par
\emph{Web page:} \url{} \par
\emph{Implementation language:} C \par
\emph{Main person behind the system:} Norman Megill \par
\emph{Person who did $\sqrt{2}\not\in{\mathbb Q}$:} Norman Megill \par
\smallskip
\item
\textbf{Theorema} \par
\emph{Web page:} \url{} \par
\emph{Implementation language:} Mathematica \par
\emph{Main person behind the system:} Bruno Buchberger \par
\emph{People who did $\sqrt{2}\not\in{\mathbb Q}$:} Markus Rosenkranz, Tudor Jebelean, Bruno Buchberger \par
\smallskip
\item
\textbf{Lego} \par
\emph{Web page:} \url{} \par
\emph{Implementation language:} ML \par
\emph{Main person behind the system:} Randy Pollack \par
\emph{Person who did $\sqrt{2}\not\in{\mathbb Q}$:} Conor McBride \par
\smallskip
\item
\textbf{NuPRL} \par
\emph{Web page:} \url{} \par
\emph{Implementation languages:} ML, Lisp \par
\emph{Main person behind the system:} Robert Constable \par
\emph{Person who did $\sqrt{2}\not\in{\mathbb Q}$:} Paul Jackson \par
\smallskip
\item
\textbf{$\bf\rmOmega$mega} \par
\emph{Web page:} \url{} \par
\emph{Implementation language:} Lisp \par
\emph{Main person behind the system:} J\"org Siekmann \par
\emph{People who did $\sqrt{2}\not\in{\mathbb Q}$:} Christoph Benzm\"uller, Armin Fiedler, Andreas Meier, Martin Pollet \par
\end{enumerate}
For most systems it is clear why they are in this list,
but a few need explanation.
Otter is not designed for the development of a structured body
of mathematics in the QED style, but instead is used in a `one shot' way to solve logical puzzles.
Also it is only one of the members (although the best known) of the large class of
\emph{first order theorem provers}.
The reason that we still have included Otter in this list (and only Otter)
is that Art Quaife has used Otter to develop a body of mathematics
in Euclidean geometry and set theory \cite{qua:92}.
Also Otter is the only program in the list that has
been used for the solution of open mathematical problems,
as listed in \url{}.
A reason to single out Otter from the first order provers
is that Otter has the Ivy program for separate checking of its proofs.
ACL2 is not primarily designed for mathematics.
In particular it has a rather weak logic, without an explicit existential quantifier.
However the Nqthm system (a predecessor of ACL2, which is very similar) has been
used to formalize significant theorems like G\"odels first incompleteness theorem
\cite{sha:94}.
Also Nqthm was the system that the authors of the QED manifesto
had in mind.
The Metamath system \cite{meg:97} maybe should not be counted as an `industrial strength' system:
it only has one user.
However, the system is beautifully executed and differs in
many respects from the other systems.
For one thing it is very fast: it can check its full (non-trivial) library
in only a few seconds.
Also it really makes the logical structure of the mathematics completely
transparent.
%\subsection{Older and newer versions}
Some of the systems in the list have predecessors or (recent) successors:
\begin{center}
\begin{tabular}{lcl}
ALF, Half &$\to$& \textbf{Agda} \\
Nqthm &$\quad\to\quad$& \textbf{ACL2} \\
\textbf{Imps} &$\to$& MathScheme \\
\textbf{Lego} &$\to$& Oleg, Plastic \\
\textbf{NuPRL} &$\to$& MetaPRL
\end{tabular}
\end{center}
We did not include any of those.
We expect systems from the same origin to be reasonably close to each other.
%\subsection{Some other provers that have been left out}
Some recent provers,
like the KIV system and the `B method',
have been especially designed
for verification of hardware and software.
These systems can also be used for the formalization
of mathematics,
but we have not included them in our comparison.
%
Other systems,
like the Twelf system and the Typelab system,
are more for formalizing logic than for mathematics.
After some discussion with the authors of these systems we decided
to omit these as well.
%
Finally there is the TPS system which is similar
to Otter but for higher order logic.
However, it misses what makes
Otter interesting for this comparison, so it was left out too.
\section{Files containing mathematics}\label{files}
\subsection{Seven ways to represent mathematics}\label{representations}
When we were editing the fifteen proofs of the irrationality
of $\sqrt{2}$ for presentation on paper,
it turned out that often it was difficult to present
the proofs in such a way that it was clear what was going on.
For various systems we needed to show the proof in multiple representations.
Some reflection showed that these representations could be divided into
seven groups:
\begin{systable}
definitions \& statements of lemmas &
$i$ & $i$ & $i$ & $i$ & $i$ & $i$ & $i$ & $i$ & $i$ & $i$ & $i$ & $x$ & $i$ & $i$ & $i$ \\
proof scripts &
$=$ & $=$ & $i$ & $=$ & $ $ & $=$ & $=$ & $ $ & $=$ & $=$ & $=$ & $ $ & $=$ & $i$ & $i$ \\
trace of interactive session &
$o$ & $ $ & $o$ & $o$ & $o$ & $o$ & $ $ & $o$ & $ $ & $ $ & $o$ & $=$ & $o$ & $o$ & $o$ \\
representation with symbols &
$ $ & $ $ & $ $ & $ $ & $ $ & $=$ & $=$ & $ $ & $x$ & $ $ & $o$ & $=$ & $ $ & $=$ & $o$ \\
representation in natural language &
$ $ & $=$ & $ $ & $ $ & $ $ & $ $ & $ $ & $=$ & $=$ & $ $ & $ $ & $=$ & $ $ & $ $ & $o$ \\
stored formalization state &
$ $ & $ $ & $o$ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $o$ & $o$ \\
$\lambda$-term or other `proof object' &
$ $ & $ $ & $ $ & $o$ & $o$ & $o$ & $=$ & $ $ & $ $ & $ $ & $ $ & $ $ & $o$ & $ $ & $=$ \\
\end{systable}
In this table an `$i$' means `input file' that has been made by the user.
A `$o$' means `output file' that has been generated by the system.
An `$x$' means that it is a mixture of input from the user and output from
the system.
An `$=$' sign means that this is part of the same file as the item
above it in the same column.
For instance, in the Mizar system the statement of the lemmas, the `proof
scripts' that the user enters, and the `natural language' representation,
all are in one file (the \texttt{.miz} file)
which is written by the user.
In this table we only included files that can be displayed on paper.
For some systems the stored formalization state is not
represented in this table
because it is a binary file which is not humanly readable.
\subsection{Comparing the sizes of the input files}
We now compare the sizes of the formalization of the irrationality
of $\sqrt{2}$ in the fifteen systems.
This not only compares the systems but also the
styles of the people who did the formalization and the complexity of the proof that they selected for formalization.
Therefore it only gives a rough indication of the `compactness' of the systems.
Also, comparing systems based on a single proof problem is
statistically not meaningful.
Having the same proof in fifteen systems does show the proof styles of
the systems surprisingly well,
but it does not say much about the quality of the provers.
Still, we will list the sizes of the files.
Here are the precise statements that were proved (or, in the case of Otter, disproved) in the fifteen systems:
\begin{center}
\begin{tabular}{lcl}
\noalign{\smallskip}\cline{1-3}\noalign{\smallskip}
HOL &$\quad$& \verb|~rational(sqrt(&2))| \\%\noalign{\smallskip}
Mizar && \verb|sqrt 2 is irrational| \\%\noalign{\smallskip}
PVS && \verb|NOT Rational?(sqrt(2))| \\%\noalign{\smallskip}
Coq && \verb|(irrational (sqrt (S (S O))))| \\%\noalign{\smallskip}
Otter && \verb|m(a,a) = m(2,m(b,b))| \\%\noalign{\smallskip}
Isabelle/Isar && \emph{sqrt}$\;($\emph{real}$\;($\emph{2::nat}$))$ $\not\in$ $\mathbb{Q}$\\%\noalign{\smallskip}
Alfa/Agda && $\mbox{{\it prime}}\,\mbox{{\it p}}\,\rightarrow\,\mbox{{\it noether}}\,\mbox{{\it A}}\,\left ( \mbox{{\it multiple}}\,\mbox{{\it p}} \right )\,\rightarrow\,\mbox{{\it isNotSquare}}\,\mbox{{\it p}}$ \\%\noalign{\smallskip}
ACL2 && \verb|(implies (equal (* x x) 2)| \\
&& \verb| (and (realp x)| \\
&& \verb| (not (rationalp x))))| \\%\noalign{\smallskip}
PhoX && \verb|/\m,n : N (m^ N2 = N2 * n^ N2 -> m = N0 & n = N0)| \\%\noalign{\smallskip}
IMPS && \verb|not #(sqrt(2),qq)| \\%\noalign{\smallskip}
Metamath &$\quad$& \verb#$p |- ( sqr ` 2_10 ) e/ QQ# \\%\noalign{\smallskip}
Theorema && $\displaystyle\neg{\rm rat}\big[{\sqrt{p}}\big]$ \\%\noalign{\smallskip}
Lego && \verb+{b|nat}{a|nat}+ \\
&& \verb|(Eq (times two (times a a)) (times b b))->| \\
&& \verb|(Eq a zero /\ Eq b zero)| \\%\noalign{\smallskip}
NuPRL && $\neg\texttt{(}\exists\texttt{u:}{\mathbb Q}\texttt{. u *}$\boldmath{$_{q}$}$\texttt{ u = 2 / 1)}$ \\%\noalign{\smallskip}
$\rmOmega$mega && \verb|(not (rat (sqrt 2)))| \\
\noalign{\smallskip}\cline{1-3}\noalign{\smallskip}
\end{tabular}
\end{center}
Some people proved the statement of the irrationality in the
real numbers:
$$\sqrt{2}\not\in{\mathbb Q}$$
Others did not have a library of real numbers (or did not want to use it)
and only proved a statement about natural numbers:
$$m^2 = 2 n^2 \iff m = n = 0$$
The Agda proof by Thierry Coquand proves something
still more basic.
It does not talk about the number two in the natural numbers, but instead about
any element in a commutative monoid that satisfies some conditions.
%The precise statement that was proved was
%$\mathop{\mbox{{\it prime}}} p\rightarrow\mathop{\mbox{{\it noether}}} A\,(\mathop{\mbox{{\it multiple}}}p)\rightarrow\mathop{\mbox{{\it isNotSquare}}} p$.
The Otter proof has a similar structure.
Most people proved the irrationality of the square root of two,
but some only proved the irrationality of an arbitrary prime number
(where `prime' means that the number divides a product if and only if it
divides one of the factors).
This might sound stronger,
but Conor McBride noted
that it is the other way around: it is also needs some non-trivial
work to prove that two is prime.
Most people proved the statement using the library of their system,
but not all systems have a library.
In that case some lemmas were proved from statements that were taken as axioms.
The $\rmOmega$mega system does have a standard library,
but not all statements in this library have been proved using the system.
For the irrationality of $\sqrt{2}$ four lemmas were added to
this library, but only one was proved.
The Agda system does not have a library, but the formalization did
not use any unproved statements.
It defined everything that was used, including the logic.
The IMPS proof is by far the largest in the collection.
It could have been quite a bit shorter,
but William Farmer chose to first prove
the more general statement that
the square root of any non-square number is irrational.
\begin{center}
\begin{tabular}{lrcccl}
& \emph{lines} & & &$\quad$& \emph{fragment} \\
\noalign{\smallskip}\cline{1-6}\noalign{\smallskip}
Otter & 17 & monoid & prime && one lemma \\
HOL & 29 & $\mathbb R$ & 2 && from library \\
$\rmOmega$mega & 38\rlap{$^*$} & $\mathbb R$ & 2 && from library, one lemma out of four \\
Theorema & 39\rlap{$^*$} & $\mathbb R$ & prime && two lemmas \\
Mizar & 44 & $\mathbb R$ & 2 && from library \\
NuPRL & 54\rlap{$^*$} & $\mathbb N$ & 2 && from library \\
Coq & 68 & $\mathbb R$ & 2 && from library \\
PVS & 77 & $\mathbb R$ & 2 && from library \\
Metamath$\quad$ & 81 & $\mathbb R$ & 2 && from library \\
Isabelle & 114 & $\mathbb R$ & 2 && from library \\
PhoX & 151 & $\mathbb N$ & 2 && from library \\
ACL2 & 206 & $\mathbb R$ & 2 && from library \\
Agda & 230 & $\quad$monoid$\quad$ & prime && stand alone \\
Lego & 261 & $\mathbb N$ & 2 && from library \\
IMPS & 663 & $\mathbb R$ & 2 && from library \\
\noalign{\smallskip}\cline{1-6}\noalign{\smallskip}
\end{tabular}
\end{center}
For this table we only counted non-blank non-comment lines.\footnote{%
The relevant files are on the
Web in \url{}
}
The line counts that have been marked with an asterisk do not
refer to a specific file, but instead
are combined counts of relevant parts of files.
%For instance, for the Theorema system
%the line count counts the number of lines from the worksheet
%that has been entered by the user.
If we restrict ourselves to formalizations that prove the full statement
about the number two in the real numbers without omitting anything,
then the three shortest proofs are the HOL, Mizar and Coq proofs.
\subsection{The library}
In practice for serious formalization of mathematics a good library is
more important than a user friendly system.
The Mizar systems has the largest library by far.
It proves over 32 thousand lemmas,
taking 50 megabytes or 1.4 million lines.
The following systems currently have a large \emph{mathematical}
library:
\begin{systable}
large mathematical library &
$~$ & $~$ & $~$ & $~$ & $ $ & $~$ & $ $ & $ $ & $ $ & $~$ & $ $ & $ $ & $ $ & $~$ & $ $ \\
\end{systable}
\section{Differences in logical strength}\label{logics}
\subsection{Logics and type systems}
The systems vary in underlying logic and type system.
Here is a table that shows the different logics of the fifteen systems:
\begin{systable}
primitive recursive arithmetic &
$ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $~$ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ \\
first order logic &
$ $ & $ $ & $ $ & $ $ & $~$ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ \\
higher order logic &
$~$ & $ $ & $~$ & $ $ & $ $ & $~$ & $~$ & $ $ & $~$ & $~$ & $ $ & $~$ & $~$ & $ $ & $~$ \\
first order set theory &
$ $ & $~$ & $ $ & $ $ & $ $ & $~$ & $ $ & $ $ & $ $ & $ $ & $~$ & $ $ & $ $ & $ $ & $ $ \\
higher order type theory &
$ $ & $ $ & $ $ & $~$ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $~$ & $ $ \\
\systableline
classical logic &
$~$ & $~$ & $~$ & $ $ & $~$ & $~$ & $ $ & $~$ & $~$ & $~$ & $~$ & $~$ & $ $ & $ $ & $~$ \\
constructive logic &
$ $ & $ $ & $ $ & $~$ & $ $ & $ $ & $~$ & $ $ & $ $ & $ $ & $ $ & $ $ & $~$ & $~$ & $ $ \\
quantum logic &
$ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $~$ & $ $ & $ $ & $ $ & $ $ \\
\systableline
fixed logic &
$~$ & $~$ & $~$ & $~$ & $~$ & $ $ & $~$ & $~$ & $~$ & $~$ & $ $ & $~$ & $~$ & $~$ & $~$ \\
logical framework &
$ $ & $ $ & $ $ & $ $ & $ $ & $~$ & $ $ & $ $ & $ $ & $ $ & $~$ & $ $ & $ $ & $ $ & $ $ \\
\end{systable}
A logical framework does not just support some given logics,
but instead the user is able to define logics of his own.
In the case of a logical framework
the first two sections of the table indicate the most commonly
used logics of the system.
Coq and NuPRL
are implementations of variants of
intuitionistic type theory.
The `classical variants' of these systems are equiconsistent with
ZFC set theory with countably many inaccessible cardinals.\footnote{%
In the NuPRL system the classical variant is officially supported.
In the Coq system the equiconsistency actually seems to break down,
because the impredicativity of Coq is inconsistent with classical mathematics.
}
In particular, they are quite a bit stronger than the higher
order logics of systems like HOL.
The Mizar system is logically even stronger than this,
because it has arbitrarily large
inaccessibles. However, Mizar is clearly a first order
system.\footnote{%
Steps from Mizar proofs correspond directly to
first order problems \cite{dah:wer:97}.
As part of his PhD research,
Josef Urban from the Charles University in Prague is developing
software that can
export any Mizar step in TPTP format.
}
The Coq and NuPRL systems are higher order systems.\footnote{%
The NuPRL type theory is predicative,
which means that it does not have
just one type for propositions, but one for each type universe.
However, in practice NuPRL is higher order logic:
it can abstract and quantify over arbitrary higher order types.
}
Here is a table that shows the type systems of the fifteen systems
(a system is only considered typed when the types are first class
objects that occur in variable declarations and quantifiers):
\begin{systable}
untyped &
$ $ & $ $ & $ $ & $ $ & $~$ & $ $ & $ $ & $~$ & $ $ & $ $ & $~$ & $~$ & $ $ & $ $ & $ $ \\
decidable non-dependent types &
$~$ & $ $ & $ $ & $ $ & $ $ & $~$ & $ $ & $ $ & $~$ & $~$ & $ $ & $ $ & $ $ & $ $ & $~$ \\
decidable dependent types &
$ $ & $~$ & $ $ & $~$ & $ $ & $ $ & $~$ & $ $ & $ $ & $ $ & $ $ & $ $ & $~$ & $ $ & $ $ \\
undecidable dependent types &
$ $ & $ $ & $~$ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $~$ & $ $ \\
\end{systable}
\subsection{The de Bruijn criterion}
\begin{systable}
de Bruijn criterion &
$~$ & $ $ & $ $ & $~$ & $~$ & $~$ & $~$ & $ $ & $~$ & $ $ & $~$ & $ $ & $~$ & $ $ & $~$ \\
\end{systable}
The de Bruijn criterion states that the correctness of the mathematics
in the system should be guaranteed by a \emph{small} checker.
Architecturally this generally means that there is a `proof
kernel' that all the mathematics is filtered through.
In the HOL Light variant of the HOL system this kernel is extremely small: it consists of only 285 lines of ML.
The NuPRL system is just over the border of this criterion.
It has a proof checking kernel
but this kernel is not small.
The Otter system does not have a proof checking kernel built into the
system.
However, there is the Ivy system that can export Otter
proofs in a form
that can be checked by a very small ACL2 program.
\section{Proof checking or theorem proving}\label{automation}
\subsection{Interaction styles}
Among the systems one finds three different interaction
styles. First, there are the systems in which the user writes
the text of the proof and the system checks the correctness afterwards.
Second, there are the `proof assistants' which keep a proof state
for the user.
The user then modifies this proof state through the
application of so-called `tactics'.
Third, there are the automated theorem provers
which automatically prove lemmas that the user states.
The involvement of the user then only consists of the selection of
the lemmas (as `stepping stones' towards the final result) and
the selection of parameters for the prover.
\begin{systable}
proof checking &
$ $ & $~$ & $ $ & $ $ & $ $ & $ $ & $~$ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ \\
goal transformation through tactics &
$~$ & $ $ & $~$ & $~$ & $ $ & $~$ & $ $ & $ $ & $~$ & $~$ & $~$ & $ $ & $~$ & $~$ & $~$ \\
automated theorem proving &
$ $ & $ $ & $ $ & $ $ & $~$ & $ $ & $ $ & $~$ & $ $ & $ $ & $ $ & $~$ & $ $ & $ $ & $ $ \\
\end{systable}
Note that tactic-based provers can still have powerful automation.
For instance
the PVS system has powerful decision procedures.
\subsection{The Poincar\'e principle and automation}
\begin{systable}
Poincar\'e principle &
$~$ & $ $ & $~$ & $~$ & $~$ & $~$ & $ $ & $~$ & $~$ & $~$ & $ $ & $~$ & $~$ & $~$ & $~$ \\
user automation &
$~$ & $ $ & $~$ & $~$ & $ $ & $~$ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $ $ & $~$ & $~$ \\
powerful built-in automation &
$~$ & $ $ & $~$ & $ $ & $~$ & $~$ & $ $ & $~$ & $ $ & $~$ & $ $ & $~$ & $ $ & $ $ & $~$ \\
\end{systable}
\noindent
An important aspect of a mathematical system is automation
of trivial tasks.
In particular a user should not need to spell out calculations in
detail.
A system that can prove the correctness of calculations
automatically is said to satisfy the Poincar\'e principle
\cite{bar:97},\footnote{%
Henk Barendregt links the Poincar\'e principle to reduction in type theory, but
this is only partially correct.
The Agda system has $\beta\delta\iota$-reduction but it cannot use it for
reflection as it lacks the automation to lift the expressions
to the syntactical level, so we claim that Agda does not satisfy the
Poincar\'e principle.
On the other hand HOL does not have
reduction in its logic, but it is easy to program HOL `conversions'
to prove calculations automatically.
}
because in \cite{poi:02} Henri Poincar\'e wrote about showing
the correctness of the calculation $2+2 = 4$:
\begin{quote}
{\em `Ce n'est pas une d\'emonstration proprement dite, {\rm [\dots]}
c'est une v\'erifica\-tion'.
{\rm [\dots]}
La v\'erification diff\`ere pr\'ecis\'ement de la v\'eritable
d\'emonstra\-tion, parce qu'elle est purement analytique et parce qu'elle est
st\'erile.}
\end{quote}
An important aspect of a prover is whether it has an `open' architecture,
i.e., whether the user can write programs to solve proof problems
algorithmically.
Most provers allow \emph{some} automation like this, but in the table we indicate
whether this programmability is on the level of the implementation of the system.
Finally there is the aspect whether a prover already has strong
automation built-in.\footnote{%
We only
considered a tactic-based prover to have `powerful built-in automation'
if it has a tactic
that is a full first order prover,
like Isabelle's \emph{blast}.
}
Examples of such automation are
decision procedures for algebraic problems,
proof search procedures,
and automation of induction.
\section{A rather subjective two-dimensional diagram}\label{diagram}
\begin{center}
\setlength{\unitlength}{.7mm}
\begin{picture}(145,125)(-40,-20)
%\put(-40,-20){\framebox(145,125){}}
\thinlines
\put(-30,-10){\vector(0,1){108}}\put(-33,76){\makebox(0,0)[r]{\begin{sideways}\emph{more automation}\end{sideways}}}
\put(-30,-10){\vector(1,0){128}}\put(74,-12){\makebox(0,0)[t]{\emph{more mathematical}}}
\put(-20,90){\circle{6}}\put(-20,90){\makebox(0,0)[l]{\ ACL2}}
\put(42,80){\circle{10}}\put(42,80){\makebox(0,0)[l]{\ PVS}}
\put(74,74){\circle{10}}\put(74,74){\makebox(0,0)[l]{\ Isabelle}}
\put(62,72){\circle{10}}\put(62,72){\makebox(0,0)[l]{\ HOL}}
\put(54,70){\circle{6}}\put(54,70){\makebox(0,0)[l]{\ $\rmOmega$mega}}
\put(16,68){\circle*{2}}\put(16,68){\makebox(0,0)[l]{\ Otter}}
\put(20,66){\circle{6}}\put(20,66){\makebox(0,0)[l]{\ Theorema}}
\put(24,62){\circle{10}}\put(24,62){\makebox(0,0)[l]{\ IMPS}}
\put(70,60){\circle{10}}\put(70,60){\makebox(0,0)[l]{\ NuPRL}}
\put(78,58){\circle{10}}\put(78,58){\makebox(0,0)[l]{\ Coq}}
\put(58,41){\circle{6}}\put(58,41){\makebox(0,0)[l]{\ PhoX}}
\put(82,39){\circle{6}}\put(82,39){\makebox(0,0)[l]{\ Lego}}
\put(66,10){\circle{20}}\put(66,10){\makebox(0,0)[l]{\ Mizar}}
\put(86,1){\circle{6}}\put(86,1){\makebox(0,0)[l]{\ Agda}}
\put(90,-1){\circle{6}}\put(90,-1){\makebox(0,0)[l]{\ Metamath}}
\end{picture}
\end{center}
We compiled some of the information about the systems into a
diagram (like the Herzsprung-Russell
diagram classifying stars in astronomy).
On the horizontal axis the diagram shows how `mathematical' the
logic of the system is.
On the vertical axis it shows how much automation the system offers.
The sizes of the circles correspond to the sizes of the respective mathematical
libraries.
The positions of the systems in this diagram are rather subjective.
Most circles can be argued around quite a bit.
%
To make the diagram somewhat objective we have
`scored' various aspects of the systems.
This made the diagram turn out to be surprising to us:
we had expected the Theorema and IMPS systems to
score more mathematical than they do in this diagram,
and the Otter system to score higher
on the automation axis.
%On the other hand the fact that NuPRL and Coq end up together
%seems natural, and the same holds for HOL and Isabelle.
Items that added to the score for `more mathematical' were:
\begin{itemize}
\item more powerful logic
\item logical framework
\item dependent types
\item de Bruijn criterion
\end{itemize}
while items that added to the score for `more automation' were:
\begin{itemize}
\item more automated interaction style
\item Poincar\'e principle
\item user automation
\item powerful built-in automation
\end{itemize}
\section{Future work}
The comparison in this paper does not focus in detail on the automation
of the systems.
It is worthwhile to investigate the various kinds
of automation of these systems in more detail, and in particular to investigate their algebraic decision
procedures.
\let~=\tie
%\bibliographystyle{plain}
%\bibliography{frk}
\begin{thebibliography}{10}
\bibitem{age:bey:dyb:96}
S.~Agerholm, I.~Beylin, and P.~Dybjer.
\newblock {A Comparison of HOL and ALF Formalizations of a Categorical
Coherence Theorem}.
\newblock In {\em TPHOLs'96}, volume 1125 of {\em LNCS}, pages 17--32.
Springer-Verlag, 1996.
\bibitem{age:gor:95}
S.~Agerholm and M.J.C. Gordon.
\newblock {Experiments with ZF Set Theory in HOL and Isabelle}.
\newblock In {\em 8th International Workshop on Higher Order Logic Theorem
Proving and its Applications}, volume 971, pages 32--45. Springer-Verlag,
1995.
\bibitem{bar:97}
Henk Barendregt.
\newblock {The impact of the lambda calculus}.
\newblock {\em Bulletin of Symbolic Logic}, 3(2), 1997.
\bibitem{bar:geu:01}
Henk Barendregt and Herman Geuvers.
\newblock {Proof-Assistants Using Dependent Type Systems}.
\newblock In Alan Robinson and Andrei Voronkov, editors, {\em Handbook of
Automated Reasoning}. Elsevier Science Publishers B.V., 2001.
\bibitem{bas:kau:91}
D.~Basin and M.~Kaufmann.
\newblock {The Boyer-Moore Prover and NuPRL: An experimental comparison}.
\newblock In {\em Proceedings of the First Workshop on `Logical Frameworks',
Antibes, France}, pages 89--119. Cambridge University Press, 1991.
\bibitem{qed:94}
R.~Boyer et~al.
\newblock {The QED Manifesto}.
\newblock In A.~Bundy, editor, {\em Automated Deduction -- CADE 12}, volume 814
of {\em LNAI}, pages 238--251. Springer-Verlag, 1994.
\newblock \url{}.
\bibitem{buc:jeb:kri:mar:vas:97}
B.~Buchberger, T.~Jebelean, F.~Kriftner, M.~Marin, and D.~Vasaru.
\newblock {An Overview on the Theorema project}.
\newblock In W.~Kuechlin, editor, {\em Proceedings of ISSAC'97 (International
Symposium on Symbolic and Algebraic Computation)}, Maui, Hawaii, 1997. ACM
Press.
\bibitem{con:all:bro:cle:cre:har:how:kno:men:pan:sas:smi:86}
Robert~L. Constable, Stuart~F. Allen, H.M. Bromley, W.R. Cleaveland, J.F.
Cremer, R.W. Harper, Douglas~J. Howe, T.B. Knoblock, N.P. Mendler,
P.~Panangaden, James~T. Sasaki, and Scott~F. Smith.
\newblock {\em Implementing Mathematics with the {N}uprl Development System}.
\newblock Prentice-Hall, NJ, 1986.
\bibitem{dah:wer:97}
Ingo Dahn and Christoph Wernhard.
\newblock {First Order Proof Problems Extracted from an Article in the MIZAR
Mathematical Library}.
\newblock In {\em Proceedings of the International Workshop on First order
Theorem Proving}, number 97-50 in RISC-Linz Report Series, pages 58--62,
Linz, 1997. Johannes Kepler Universit\"at.
\bibitem{tho:39}
G.P. Goold, editor.
\newblock {\em Selections illustrating the history of Greek mathematics, with
an English translation by Ivor Thomas}.
\newblock Harvard University Press, London, 1939.
\bibitem{gri:hui:98}
David Griffioen and Marieke Huisman.
\newblock {A comparison of PVS and Isabelle/HOL}.
\newblock In Jim Grundy and Malcolm Newey, editors, {\em Theorem Proving in
Higher Order Logics: 11th International Conference, TPHOLs'98}, volume 1479
of {\em LNCS}, pages 123--142. Springer-Verlag, 1998.
\bibitem{jak:cou:cur:97}
L.~Jakubiec, S.~Coupet-Grimal, and P.~Curzon.
\newblock {A Comparison of the Coq and HOL Proof Systems for Specifying
Hardware}.
\newblock In E.~Gunter and A.~Felty, editors, {\em International Conference on
Theorem Proving in Higher Order Logics: B-Track}, pages 63--78, 1997.
\bibitem{kau:man:moo:00}
Matt Kaufmann, Panagiotis Manolios, and J.~Strother Moore.
\newblock {\em Computer-Aided Reasoning: An Approach}.
\newblock Kluwer Academic Publishers, Boston, 2000.
\bibitem{meg:97}
Norman~D. Megill.
\newblock {Metamath, A Computer Language for Pure Mathematics}.
\newblock \url{}, 1997.
\bibitem{muz:93}
M.~Muzalewski.
\newblock {\em An Outline of PC Mizar}.
\newblock Fondation Philippe le Hodey, Brussels, 1993.
\newblock \url{}.
\bibitem{nau:ste:mes:01}
P.~Naumov, M.-O. Stehr, and J.~Meseguer.
\newblock {The HOL/NuPRL Proof Translator: A Practical Approach to Formal
Interoperability}.
\newblock In R.J. Boulton and P.B. Jackson, editors, {\em The 14th
International Conference on Theorem Proving in Higher Order Logics}, volume
2152 of {\em LNCS}, pages 329--345. Springer-Verlag, 2001.
\bibitem{ned:geu:vri:94}
R.P. Nederpelt, J.H. Geuvers, and R.C. de~Vrijer.
\newblock {\em Selected Papers on Automath}, volume 133 of {\em Studies in
Logic and the Foundations of Mathematics}.
\newblock Elsevier Science, Amsterdam, 1994.
\bibitem{owr:rus:sha:92}
S.~Owre, J.~Rushby, and N.~Shankar.
\newblock {PVS: A prototype verification system}.
\newblock In D.~Kapur, editor, {\em 11th International Conference on Automated
Deduction (CADE)}, volume 607 of {\em LNAI}, pages 748--752, Berlin,
Heidelberg, New York, 1992. Springer-Verlag.
\bibitem{poi:02}
Henri Poincar\'e.
\newblock {\em La Science et l'Hypoth\`ese}.
\newblock Flammarion, Paris, 1902.
\bibitem{qua:92}
A.~Quaife.
\newblock {\em Automated Development of Fundamental Mathematical Theories}.
\newblock Kluwer Academic, 1992.
\bibitem{sha:94}
N.~Shankar.
\newblock {\em {Metamathematics, Machines and G\"odel's Proof}}.
\newblock Number~38 in Cambridge Tracts in Theoretical Computer Science.
Cambridge University Press, 1994.
\bibitem{toe:63}
Otto T\"oplitz.
\newblock {\em The Calculus: A Genetic Approach}.
\newblock University of Chicago Press, Chicago, 1963.
\newblock Translated by Luise Lange.
\bibitem{wie:99:1}
F.~Wiedijk.
\newblock {Mizar: An Impression}.
\newblock
\url{},
1999.
\end{thebibliography}
\end{document}