\documentclass[runningheads,envcountsame,envcountsect]{llncs}
\usepackage{amssymb,url,alltt,graphicx}
\raggedbottom
\begin{document}
\title{Statistics on digital libraries of mathematics}
\author{Freek Wiedijk}
\institute{
%Foundations of Computing Science \\
Institute for Computing and Information Sciences \\
%Faculty of Science \\
Radboud University Nijmegen \\
Toernooiveld 1, 6525 ED Nijmegen, The Netherlands}
\maketitle
\begin{abstract}
We present statistics on the standard
libraries of four major proof assistants for mathematics:
HOL Light, Isabelle/HOL, Coq and Mizar.
\end{abstract}
\section{Introduction}
\subsection{Problem}
The advent of digital computers has introduced a new way
of doing mathematics called `formalized mathematics'.
In this style of doing mathematics one encodes the mathematics in
the computer in sufficient detail that the computer can fully check
the correctness according to a small number of
logical rules.
This style of doing mathematics is much more precise and trustable than the
traditional way of first understanding the mathematics in one's head and then
just writing it on a blackboard or on paper.
Also it is a very pleasurable experience to write down one's
mathematics in a way that \emph{all} the details are there,
knowing that there is nothing left implicit.
However, these positive aspects of formalized mathematics have to be
paid for.
Generally it takes much longer to turn mathematics into formalized
form than it takes to just understand it, or even than to write it down
in a traditional way.
(A rough estimate might be that it takes about ten times as long to
formalize something than it takes to write it down in meticulous traditional
`textbook style'.)
One might wonder where this time is going, i.e., how much it is spent
on the various aspects of formalization.
For instance there are the aspects of formalizing the definitions, choosing good
notation for the defined notions, then stating the appropriate
formal statements to be proved, and finally writing the formal proofs themselves.
Another question that might be posed is whether there are significant differences in the time needed for these activities
between the different \emph{systems} for formalization of mathematics.
In this paper we will study these questions.
We will not do this by focusing on the \emph{activity} of formalization,
but rather by studying the \emph{results} of this activity,
the libraries of formalized mathematics that have been created by the
various research communities that work on this subject.
These libraries have grown into quite large human `artifacts', which
-- we claim -- deserve study in their own right.
In this paper we will do this by collecting various statistics on these
libraries.
One might compare our work here to that of a biologist who just makes
an inventarization of
the different species that are out there in the world.
In this paper we mainly just collect data.
The question that we will address here is what are the
different aspects of formalization that one can find in the formalized
libraries that are out there,
how much of those libraries is spent on which of these aspects,
and whether the different systems for formalization are more or less similar
in these aspects
or whether they have significant differences.
\subsection{Approach}
The way that we count the libraries of formalized mathematics is
as follows.
First we concatenate all the files for a system into one huge file.
Then we tag each line of this file with the \emph{category} of that
line, and then we count the different types of lines that we find.
We will explain this procedure with a small example.
Here is a very small formalization of the irrationality of the square root
of two by John Harrison in the HOL Light system (taken from \emph{The Seventeen Provers of the World} \cite{wie:06}, a collection of formalizations of
this irrationality proof in various systems):
\setlength{\arrayrulewidth}{.18pt}
\begin{center}
\vspace{1.5\smallskipamount}
{\scriptsize
\begin{tabular}{|clc|}
\hline
$\enskip$ & & $\enskip$ \\
& \verb|(* ------------------------------------------------------------------------- *)| & \\
& \verb|(* Definition of rationality (& = natural injection N->R). *)| & \\
& \verb|(* ------------------------------------------------------------------------- *)| & \\
& & \\
& \verb|let rational = new_definition| & \\
& \verb| `rational(r) = ?p q. ~(q = 0) /\ abs(r) = &p / &q`;;| & \\
& & \\
& \verb|(* ------------------------------------------------------------------------- *)| & \\
& \verb|(* The main lemma, purely in terms of natural numbers. *)| & \\
& \verb|(* ------------------------------------------------------------------------- *)| & \\
& & \\
& \verb|let NSQRT_2 = prove| & \\
& \verb| (`!p q. p * p = 2 * q * q ==> q = 0`,| & \\
& \verb| MATCH_MP_TAC num_WF THEN REWRITE_TAC[RIGHT_IMP_FORALL_THM] THEN| & \\
& \verb| REPEAT STRIP_TAC THEN FIRST_ASSUM(MP_TAC o AP_TERM `EVEN`) THEN| & \\
& \verb| REWRITE_TAC[EVEN_MULT; ARITH] THEN REWRITE_TAC[EVEN_EXISTS] THEN| & \\
& \verb| DISCH_THEN(X_CHOOSE_THEN `m:num` SUBST_ALL_TAC) THEN| & \\
& \verb| FIRST_X_ASSUM(MP_TAC o SPECL [`q:num`; `m:num`]) THEN| & \\
& \verb| POP_ASSUM MP_TAC THEN CONV_TAC SOS_RULE);;| & \\
& & \\
& \verb|(* ------------------------------------------------------------------------- *)| & \\
& \verb|(* Hence the irrationality of sqrt(2). *)| & \\
& \verb|(* ------------------------------------------------------------------------- *)| & \\
& & \\
& \verb|let SQRT_2_IRRATIONAL = prove| & \\
& \verb| (`~rational(sqrt(&2))`,| & \\
& \verb| SIMP_TAC[rational; real_abs; SQRT_POS_LE; REAL_POS; NOT_EXISTS_THM] THEN| & \\
& \verb| REPEAT GEN_TAC THEN DISCH_THEN(CONJUNCTS_THEN2 ASSUME_TAC MP_TAC) THEN| & \\
& \verb| DISCH_THEN(MP_TAC o AP_TERM `\x. x pow 2`) THEN| & \\
& \verb| ASM_SIMP_TAC[SQRT_POW_2; REAL_POS; REAL_POW_DIV; REAL_POW_2; REAL_LT_SQUARE;| & \\
& \verb| REAL_OF_NUM_EQ; REAL_EQ_RDIV_EQ] THEN| & \\
& \verb| ASM_MESON_TAC[NSQRT_2; REAL_OF_NUM_EQ; REAL_OF_NUM_MUL]);;| & \\
& & \\
& & \\
\hline
\end{tabular}
}
\vspace{1.5\smallskipamount}
\end{center}
\noindent
Now for each system studied in this paper, which includes the HOL Light system, we wrote a small
perl script that tags each line in a formalization with its category.
In our investigation we applied it to the full HOL Light library, but
this is what we get when we tag just this example formalization with it:
\begin{center}
\vspace{1.5\smallskipamount}
{\scriptsize
\begin{tabular}{|clc|}
\hline
$\enskip$ & & $\enskip$ \\
& \verb|C (* ------------------------------------------------------------------------- *)| & \\
& \verb|C (* Definition of rationality (& = natural injection N->R). *)| & \\
& \verb|C (* ------------------------------------------------------------------------- *)| & \\
& \verb|B | & \\
& \verb|D let rational = new_definition| & \\
& \verb|D `rational(r) = ?p q. ~(q = 0) /\ abs(r) = &p / &q`;;| & \\
& \verb|B | & \\
& \verb|C (* ------------------------------------------------------------------------- *)| & \\
& \verb|C (* The main lemma, purely in terms of natural numbers. *)| & \\
& \verb|C (* ------------------------------------------------------------------------- *)| & \\
& \verb|B | & \\
& \verb|T let NSQRT_2 = prove| & \\
& \verb|T (`!p q. p * p = 2 * q * q ==> q = 0`,| & \\
& \verb|P MATCH_MP_TAC num_WF THEN REWRITE_TAC[RIGHT_IMP_FORALL_THM] THEN| & \\
& \verb|P REPEAT STRIP_TAC THEN FIRST_ASSUM(MP_TAC o AP_TERM `EVEN`) THEN| & \\
& \verb|P REWRITE_TAC[EVEN_MULT; ARITH] THEN REWRITE_TAC[EVEN_EXISTS] THEN| & \\
& \verb|P DISCH_THEN(X_CHOOSE_THEN `m:num` SUBST_ALL_TAC) THEN| & \\
& \verb|P FIRST_X_ASSUM(MP_TAC o SPECL [`q:num`; `m:num`]) THEN| & \\
& \verb|P POP_ASSUM MP_TAC THEN CONV_TAC SOS_RULE);;| & \\
& \verb|B | & \\
& \verb|C (* ------------------------------------------------------------------------- *)| & \\
& \verb|C (* Hence the irrationality of sqrt(2). *)| & \\
& \verb|C (* ------------------------------------------------------------------------- *)| & \\
& \verb|B | & \\
& \verb|T let SQRT_2_IRRATIONAL = prove| & \\
& \verb|T (`~rational(sqrt(&2))`,| & \\
& \verb|P SIMP_TAC[rational; real_abs; SQRT_POS_LE; REAL_POS; NOT_EXISTS_THM] THEN| & \\
& \verb|P REPEAT GEN_TAC THEN DISCH_THEN(CONJUNCTS_THEN2 ASSUME_TAC MP_TAC) THEN| & \\
& \verb|P DISCH_THEN(MP_TAC o AP_TERM `\x. x pow 2`) THEN| & \\
& \verb|P ASM_SIMP_TAC[SQRT_POW_2; REAL_POS; REAL_POW_DIV; REAL_POW_2; REAL_LT_SQUARE;| & \\
& \verb|P REAL_OF_NUM_EQ; REAL_EQ_RDIV_EQ] THEN| & \\
& \verb|P ASM_MESON_TAC[NSQRT_2; REAL_OF_NUM_EQ; REAL_OF_NUM_MUL]);;| & \\
& \verb|B | & \\
& & \\
\hline
\end{tabular}
}
\vspace{1.5\smallskipamount}
\end{center}
\noindent
The lines with a `\texttt{C}' are comment lines, the ones with a `\texttt{B}'
are blank, and so on.
The perl script is \emph{ad hoc} in the sense that occasionally it will
tag a line wrong.
However, by inspecting its output we improved it until it was sufficiently
good for our purposes.
We claim that our perl scripts will tag less than 1\% of the lines in a
formalization with the wrong tag.
Finally we count the lines in this `tagged' file for the various tags,
and present the results in tabular format.
For this small example that table then becomes:
\begin{center}
\begin{tabular}{cclrcrcr}
& $\quad$ && \emph{lines} & $\hspace{1.2em}$ & \emph{bytes} & $\hspace{1.2em}$ & \\
\noalign{\smallskip}
\hline
\noalign{\smallskip}
\texttt{B} && blank lines & 6 && 65 && 18$\,\%$ \\
\texttt{C} && comment lines & 9 && 720 && 27$\,\%$ \\
\noalign{\smallskip}
\hline
\noalign{\smallskip}
\texttt{D} && definitions & 2 && 85 && 6$\,\%$ \\
\texttt{T} && theorem statements$\qquad$ & 4 && 114 && 12$\,\%$ \\
\texttt{P} && proof lines & 12 && 746 && 36$\,\%$ \\
\noalign{\smallskip}
\hline
\noalign{\smallskip}
&& \emph{total} & 33 && 1,730 && 100$\,\%$
\end{tabular}
\end{center}
\noindent
Of course for a very small example like this, the percentages
are not very meaningful.
For instance, the number of blank and comment lines is quite a bit
higher than it is in a more extensive HOL Light formalization.
The percentages in this table are in terms of line counts, and not
in terms of bytes.
We believe that line counts is the more interesting measure.
It indicates how much one can oversee behind a computer screen
without scrolling.
(This means that a formalization style where multiple steps are
put together on a single line -- a formalization style that both
%power-formalizers
John Harrison and Georges Gonthier use -- is superior to a more
`programming' like style in which each step gets a line of its own.
Georges Gonthier
convinced us in a personal communication that line counts is the better
way to measure formalizations.)
Finally, we also present the table as a pie chart,
as a graphical summary of the results.
For these charts the different kinds of lines are grouped together into only
seven categories.
(In the example two of these categories are missing.)
The pie chart for the example is:
\begin{center}
\vspace{-2em}
\includegraphics[scale=.6]{example}
\vspace{-2em}
\end{center}
\subsection{Related work}
We are not aware of already existing research into the statistics of
formalizations.
In the field of programming, counting lines of source code
is one of the methods in the subject called \emph{software metrics}.
However, there generally the focus is not on the different \emph{kinds}
of source code lines and their function in the programming languages,
but more on programmer productivity.
\subsection{Contribution}
The investigation presented here is a snapshot in time.
Also there is not too much `depth' in our results: really all we did was count.
However getting the software that tagged the formalizations reasonably
accurate took quite an effort, so obtaining the numbers in this paper was significant work.
The main value of the research described in this paper is
showing that all systems for formalizations are quite similar
despite their large differences both in foundations and in interaction
styles.
The observations in this paper might be a guide for people who design
systems for formalizations, by pointing out from the start which elements will
need to be part of their formalization language.
That way, these elements can all be designed in from the start and will
not have to appear as an afterthought later.
Finally this paper can be used as a {guide} for people who are interested
in formalization of mathematics and want to get an impression of
the current state of some important libraries.
\subsection{Outline}
The structure of this paper is as follows.
In Section~\ref{overview} we present an overview of our results.
Then in Section~\ref{counts} we give the statistics in detail.
In Section~\ref{types} we discuss the different types of lines
that we distinguished, and relate them between systems.
Finally, in Section~\ref{conclusions} we draw some conclusions from
our data and indicate possible future work.
\section{Overview}\label{overview}
The four systems that we selected for this investigation were:
\begin{itemize}
\item HOL Light \cite{har:00}
\item Isabelle/HOL \cite{nip:pau:wen:02}
\item Coq \cite{coq:06}
\item Mizar \cite{muz:93}
\end{itemize}
\noindent
Other systems that we considered were HOL4, ProofPower and PVS.
The first two are rather similar to HOL Light, so we did expect
them to get quite similar statistics.
In the HOL family of systems HOL Light is the system that has been used most for formalization of mathematics.
For this reason we selected HOL Light from that group, and left the other two systems out.
The PVS system is one of the most popular systems for formal methods
in computer science.
It has a very interesting way of dealing with
partial functions (called `predicate sub-types'), and it has strong automation.
However it has not been used as much for formalization of
mathematics as the other systems that we looked at.
For this reason we left PVS out of our comparison as well.
The four systems that we study here have mathematical libraries of quite
different sizes.
The sizes of these libraries is shown in the bar diagram on the next page.
In this diagram the left bars represent the libraries that are distributed
with the system.
If you install the system, you will have these source
files as part of the distribution.
\begin{center}
\includegraphics[scale=.6]{sizes}
\end{center}
\noindent
In these bars the gray parts are the parts of the library that can be
considered to be the library of the `core' system.
It is that part of the library of the system that is available without
doing anything special.
In the bar for the HOL Light system the rest of the library (the white part of
the bar) has been divided into
two sub-parts: the part written by John Harrison, and the part
written by other people (which is mostly the formalization of the
Jordan Curve Theorem by Tom Hales.)
In the gray bar of the Isabelle system the small part at the top corresponds to the sub-directory of the contribution called `\texttt{HOLCF}',
while in the Coq system it corresponds
to the sub-directory `\texttt{contrib}'.
The right bars represent libraries that are distributed separately from
the system itself.
These are collections of formalizations by users of the system that
have not (yet) been integrated into the standard library of the system
itself.
In the case of Coq this is called the Coq \emph{contribs} (short for `contributions'),
while in the Isabelle community it is called the AFP (Archive of Formal Proofs).
%
The Mizar system also has a library of user formalizations
called MML (= Mizar Mathematical Library), but in the case of Mizar those
formalizations
\emph{are} integrated into the standard library of the system.
We now show a summary of the statistics from this paper in
the shape of four pie charts:
\begin{center}\label{pie}
$\hspace{-2em}$%
\begin{tabular}{cc}
\includegraphics[scale=.6]{hol_light} &
\includegraphics[scale=.6]{isabelle} \\
\noalign{\vspace{0\bigskipamount}}
\includegraphics[scale=.6]{coq} &
\includegraphics[scale=.6]{mizar}
\end{tabular}%
$\hspace{-2em}$
\end{center}
\noindent
For most people the bar chart on the previous page and these pie charts
will be the most interesting part of this paper.
In the HOL Light pie chart there is a tiny sliver between
`automation' and `definitions' for the very few lines related
to `modules', but it was too narrow to be labeled.
In the Mizar pie chart the `registration' lines have been included
in the `statements' part, although we gave them category `\texttt{H}'
(which in Section~\ref{types} is in the sub-section
about automation;
we will discuss this there in more detail.)
\section{Line counts}\label{counts}
We now present the detailed statistics of the four systems, and
discuss which files were counted and which were not.
\subsection{HOL Light}
The statistics in this paper are on version 2.20 of the HOL Light
system.
HOL Light input files have suffix `\texttt{.ml}'.
%
These files both contain the implementation of the system as well as
the mathematical library, all mixed together.
We divided the files that were primarily implementing the system from
files that were primarily proving the library in the following way:
The basic HOL Light system consists of a file called \texttt{make.ml},
which loads the main file \texttt{hol.ml},
which then loads 44 more \texttt{.ml} files.
(Apart from these 46 files there are in the top level directory 14 more
\texttt{.ml} files with names of the
form `\texttt{pa\char`\_j\char`\_}\dots\texttt{.ml}'
related
to the input processing of the \texttt{ocaml} system that reads the HOL Light
files.)
Now the \texttt{hol.ml} file is divided into sections.
One of these sections has the header
`{Mathematical theories and additional proof tools.}'
We decided that the 19 files in that section
were the `mathematical library' while the 25 files in the other six sections contained the `implementation
of the system'.
Apart from these 19 files we also counted the `auxiliary library' consisting
of 169 \texttt{.ml} files in 11 sub-directories.
(There were 11 more \texttt{.ml} files in another sub-directory
named \texttt{Proofrecording}.
However that is an alternative implementation of the core system,
and therefore was left
out of this investigation.)
Altogether the number of files counted were:
\begin{center}
\begin{tabular}{lcll}
\enskip 19 files & $\quad$ & \emph{from}&\texttt{*.ml} \\
169 files && &\texttt{*/*.ml} \\
\noalign{\smallskip}
\cline{1-1}
\noalign{\smallskip}
188 files
\end{tabular}
\end{center}
\noindent
And the statistics about these files were:
\begin{quote}
\begin{tabular}{cclrcrcr}
& $\quad$ && \emph{lines} & $\hspace{1.2em}$ & \emph{bytes} & $\hspace{1.2em}$ & \\
\noalign{\smallskip}
\hline
\noalign{\smallskip}
\texttt{B} && blank lines & 16,438 && 21,371 && 8.4$\,\%$ \\
\texttt{C} && comments & 19,044 && 864,913 && 9.7$\,\%$ \\
\noalign{\smallskip}
\hline
\noalign{\smallskip}
\texttt{S} && imports & 182 && 5,459 && 0.1$\,\%$ \\
\texttt{D} && definitions & 2,547 && 107,835 && 1.3$\,\%$ \\
\texttt{N} && interfaces & 486 && 19,525 && 0.2$\,\%$ \\
\texttt{X} && automation: program code$\quad\enskip$ & 19,979 && 833,040 && 10.2$\,\%$ \\
\texttt{T} && theorem statements & 25,493 && 1,073,208 && 13.0$\,\%$ \\
\texttt{P} && proof lines & 112,088 && 4,356,332 && 57.1$\,\%$ \\
\noalign{\smallskip}
\hline
\noalign{\smallskip}
&& \emph{total} & \phantom{\enskip ,}196,257 && \enskip 7,281,683 && 100.0$\,\%$
\end{tabular}
\end{quote}
\subsection{Isabelle/HOL}
The statistics in this paper are on the Isabelle2007 version of the Isabelle/HOL system.
Isabelle has two kinds of files, with suffixes `\texttt{.ML}' and `\texttt{.thy}'.
We decided that the \texttt{.ML} files primarily contained the implementation of
the system, while the \texttt{.thy} files primarily contained the mathematical library.
Now the Isabelle system can be used with different \emph{logics}.
The HOL logic is the dominant logic that is used by almost all Isabelle users.
For this reason we counted the \texttt{.thy} files inside the \texttt{HOL}
directory
(together with \texttt{HOLCF} directory, which is closely related).
However, we left out the \texttt{HOL/Import} sub-directory as it does not
contain mathematics but is about importing theories from other systems.
The number of files counted were (here \texttt{**} stands
for zero or more sub-directory levels in between):
\begin{center}
\begin{tabular}{lcll}
649 files & $\quad$ & \emph{most of}&\texttt{src/HOL/**/*.thy} \\
\enskip 88 files &&& \texttt{src/HOLCF/**/*.thy} \\
\noalign{\smallskip}
\cline{1-1}
\noalign{\smallskip}
737 files
\end{tabular}
\end{center}
\noindent
And the statistics about these files were:
\begin{quote}
\begin{tabular}{cclrcrcr}
& $\quad$ && \emph{lines} & $\hspace{1.2em}$ & \emph{bytes} & $\hspace{1.2em}$ & \\
\noalign{\smallskip}
\hline
\noalign{\smallskip}
\texttt{B} && blank lines & 51,610 && 80,304 && 15.9$\,\%$ \\
\texttt{C} && source comments & 18,941 && 829,132 && 5.8$\,\%$ \\
\texttt{E} && document markup & 15,488 && 713,503 && 4.8$\,\%$ \\
\noalign{\smallskip}
\hline
\noalign{\smallskip}
\texttt{S} && imports \& sectioning & 2,838 && 43,584 && 0.9$\,\%$ \\
\texttt{L} && locales & 1,394 && 58,862 && 0.4$\,\%$ \\
\texttt{D} && definitions & 23,386 && 1,165,813 && 7.2$\,\%$ \\
\texttt{N} && notation & 2,736 && 129,089 && 0.8$\,\%$ \\
\texttt{H} && automation: directives & 4,022 && 191,544 && 1.2$\,\%$ \\
\texttt{X} && automation: program code$\quad\enskip$ & 5,714 && 225,786 && 1.8$\,\%$ \\
\texttt{T} && theorem statements & 58,659 && 3,136,976 && 18.0$\,\%$ \\
\texttt{P} && proof lines & 140,596 && 5,024,717 && 43.2$\,\%$ \\
\noalign{\smallskip}
\hline
\noalign{\smallskip}
&& \emph{total} & \phantom{\enskip ,}325,384 && 11,599,310 && 100.0$\,\%$
\end{tabular}
\end{quote}
\subsection{Coq}
The statistics in this paper are about version 8.1 of the Coq system.
Coq files have the suffix `\texttt{.v}'.
(The implementation of the system is in files with suffixes `\texttt{.mli}'
and `\texttt{.ml}', but unlike HOL Light and Isabelle these are in a different
part of the distribution, and are not mixed together with the mathematical library.)
The main library is in the sub-directory \texttt{theories}.
There is
a supplementary library in the sub-directory \texttt{contrib}, which mostly contains the supporting theory for several automated proof procedures.
(In this second directory the \texttt{.v} files and the \texttt{.mli} and \texttt{.ml} files
\emph{are} together.
However we also only looked at the \texttt{.v} files there.)
Altogether the number of files counted were:
\begin{center}
\begin{tabular}{lcl}
252 files & $\quad$ & \texttt{theories/*/*.v} \\
\enskip 57 files && \texttt{contrib/*/*.v} \\
\noalign{\smallskip}
\cline{1-1}
\noalign{\smallskip}
309 files
\end{tabular}
\end{center}
\noindent
And the statistics about these files were:
\begin{quote}
\begin{tabular}{cclrcrcr}
& $\quad$ && \emph{lines} & $\hspace{1.2em}$ & \emph{bytes} & $\hspace{1.2em}$ & \\
\noalign{\smallskip}
\hline
\noalign{\smallskip}
\texttt{B} && blank lines & 13,531 && 22,456 && 12.4$\,\%$ \\
\texttt{C} && non-coqdoc comments & 5,661 && 300,850 && 5.2$\,\%$ \\
\texttt{E} && coqdoc comments & 2,910 && 137,401 && 2.7$\,\%$ \\
\noalign{\smallskip}
\hline
\noalign{\smallskip}
\texttt{S} && imports \& sectioning & 2,073 && 49,377 && 1.9$\,\%$ \\
\texttt{L} && context & 1,329 && 50,686 && 1.2$\,\%$ \\
\texttt{D} && definitions & 8,778 && 308,858 && 8.0$\,\%$ \\
\texttt{N} && notation & 1,047 && 40,293 && 1.0$\,\%$ \\
\texttt{H} && automation: directives & 1,157 && 45,680 && 1.1$\,\%$ \\
\texttt{X} && automation: program code$\quad\enskip$ & 2,648 && 94,556 && 2.4$\,\%$ \\
\texttt{T} && theorem statements & 11,781 && 541,836 && 10.8$\,\%$ \\
\texttt{P} && proof lines & 58,157 && 1,981,655 && 53.3$\,\%$ \\
\noalign{\smallskip}
\hline
\noalign{\smallskip}
&& \emph{total} & \phantom{\enskip ,}109,072 && \enskip 3,573,648 && 100.0$\,\%$
\end{tabular}
\end{quote}
\subsection{Mizar}
The statistics in this paper are on version 7.8.05 of the Mizar system,
which is distributed with version 4.87.985 of the MML mathematical library.
Mizar files have the suffix `\texttt{.miz}'.
(There also are files with suffix `\texttt{.abs}' that are `abstracts' to
the formalizations,
but they are derived from the first kind of files and do not contain
any independent information.)
As the version number of the MML library already shows, there are
985 \texttt{.miz} files distributed with the system.
Therefore the number of files counted were:
\begin{center}
\begin{tabular}{lcl}
985 files & $\quad$ & \texttt{mml/*.miz} \\
\end{tabular}
\end{center}
\noindent
And the statistics about these files were:
\begin{quote}
\begin{tabular}{cclrcrcr}
& $\quad$ && \emph{lines} & $\hspace{1.2em}$ & \emph{bytes} & $\hspace{1.2em}$ & \\
\noalign{\smallskip}
\hline
\noalign{\smallskip}
\texttt{B} && blank lines & 84,609 && 87,744 && 4.3$\,\%$ \\
\texttt{C} && comments & 10,857 && 488,821 && 0.6$\,\%$ \\
\noalign{\smallskip}
\hline
\noalign{\smallskip}
\texttt{S} && environments, cancellations$\quad$ & 23,655 && 1,118,819 && 1.2$\,\%$ \\
\texttt{L} && reservations & 5,396 && 194,693 && 0.3$\,\%$ \\
\texttt{D} && definitions & 56,738 && 1,847,161 && 2.9$\,\%$ \\
\texttt{N} && notation & 1,634 && 45,598 && 0.1$\,\%$ \\
\texttt{H} && registrations & 26,016 && 749,427 && 1.3$\,\%$ \\
\texttt{T} && \rlap{theorem statements}\phantom{automation: program code}$\quad\enskip$ & 177,829 && 6,625,141 && 9.0$\,\%$ \\
\texttt{P} && proof lines & 1,582,831 && 61,096,949 && 80.4$\,\%$ \\
\noalign{\smallskip}
\hline
\noalign{\smallskip}
&& \emph{total} & 1,969,575 && 72,254,353 && 100.0$\,\%$
\end{tabular}
\end{quote}
\section{Categories of lines in a formalization}\label{types}
In the previous section we tagged lines in different
systems that had similar functions with the same letter.
Here we identify how these letters should be interpreted.
For most of the categories we list the main keywords that are associated
with the lines of that category.
For a user of the system this makes it quite clear how we divided the lines
among the categories.
(In Mizar there was the clearest bijection between keywords starting
a part of a formalization and categories in our statistics.
In the other systems the correspondence was a bit less obvious.)
\subsection{Non-content lines}
\subsubsection{B -- blank lines.}
Lines tagged `\texttt{B}' are blank lines.
These amount to a surprising large part of the total line count of a formalization.
In the byte counts of this category we also included the white space
at the end of other kinds of lines.
Also sometimes some care had to be taken with files that did not end in a newline
character.
(For such files one newline byte was added.)
\subsubsection{C -- comments.}
The following table shows the comment styles found in
the four systems:
\begin{flushleft}
\begin{tabular}{rcl}
\emph{HOL Light:} &&
\texttt{(*} \emph{comment} \texttt{*)}\enskip
\\
\emph{Isabelle/HOL:} &&
\texttt{(*} \emph{comment} \texttt{*)}\enskip
\\
\emph{Coq:} &&
\texttt{(*} \emph{comment} \texttt{*)}\enskip
\\
\emph{Mizar:} &&
\texttt{::} \emph{comment}\enskip
\end{tabular}
\end{flushleft}
\subsubsection{E -- documentation}
Isabelle and Coq generate documentation for the formalizations
by having text inside special comments.
In Isabelle these comments come in two styles, and are always prefixed
with a keyword or a double dash.
At first we included some of these lines in the sectioning category below,
but Makarius Wenzel convinced us in private communication that
they really belong in this category.
\begin{flushleft}
\begin{tabular}{rcl}
\emph{Isabelle/HOL:} &&
\texttt{header}\enskip
\texttt{section}\enskip
\texttt{subsection}\enskip
\texttt{subsubsection}\enskip
\texttt{text}\enskip
\texttt{txt}\enskip
\texttt{{-}{-}}\enskip
%\\ &&
%\emph{followed by}
\\ &&
\texttt{\char`\{*} \emph{text} \texttt{*\char`\}}\enskip\enskip
%\\ &&
\texttt{"}\emph{text}\texttt{"}\enskip
\\
\emph{Coq:} &&
\texttt{(**} \emph{text} \texttt{*)}\enskip
\end{tabular}
\end{flushleft}
\noindent
\subsection{Modules}
Imports, sectioning and modules seem closely related, but there is
a gray area with the notion of definitions.
For instance in Isabelle a `locale' might be considered to group
related definitions together, but it also might be considered to consist
of definitions.
(We chose the second interpretation.)
Similarly Coq modules seem rather close to Coq structures.
(We chose to consider the first to be about modularization
and the second to be a data-type definition.)
%One might argue with our choice of delimiting modularization from definitional
%constructions.
%In the tables in the previous section all
%categories are separate, so if someone wants to interpret the data differently,
%they are free to group the numbers their way.
\subsubsection{S -- imports and sectioning.}
We did not distinguish between lines that group parts of a formalization
together into a section or module, and lines that open or import these
sections or modules.
The main keywords for this category of lines in the four systems were:
\begin{flushleft}
\begin{tabular}{rcl}
\emph{HOL Light:} &&
\texttt{needs}\enskip
\texttt{loadt}\enskip
\\
\emph{Isabelle/HOL:} &&
\texttt{theory}\enskip
\texttt{imports}\enskip
\texttt{begin}\enskip
\texttt{use}\enskip
\texttt{uses}\enskip
\\
\emph{Coq:} &&
\texttt{Require}\enskip
\texttt{Section}\enskip
\texttt{Module}\enskip
\texttt{Import}\enskip
\\
\emph{Mizar:} &&
\texttt{environ}\enskip
\texttt{begin}\enskip
\texttt{canceled}\enskip
\end{tabular}
\end{flushleft}
\noindent
One could also consider Isabelle's \texttt{use} and \texttt{uses}
to belong to the automation category below, but we decided to consider them
to be import lines.
\subsection{Definitions}
\subsubsection{L -- contexts.}
The `\texttt{L}' lines build `contexts' in which a definition
can be made.
We considered these lines to be part of those definitions.
In the Isabelle system these contexts are named entities.
In Coq they just are implicit through the position in the
section or module.
In Mizar we used this letter for lines that introduce variable
conventions.
\begin{flushleft}
\begin{tabular}{rcl}
\emph{Isabelle/HOL:} &&
\texttt{class}\enskip
\texttt{locale}\enskip
\texttt{context}\enskip
\\ &&
\texttt{instance}\enskip
\texttt{interpret}\enskip
\texttt{interpretation}\enskip
\\
\emph{Coq:} &&
\texttt{Variable}\enskip
\texttt{Variables}\enskip
\texttt{Hypothesis}\enskip
\texttt{Parameter}\enskip
\texttt{Axiom}\enskip
\\
\emph{Mizar:} &&
\texttt{reserve}\enskip
\end{tabular}
\end{flushleft}
\subsubsection{D -- definitions.}
The systems all have numerous constructions for defining functions, predicates
and types.
Here are the main keywords for these constructions:
\begin{flushleft}
\begin{tabular}{rcl}
\emph{HOL Light:} &&
\texttt{new\char`\_definition}\enskip
\texttt{new\char`\_recursive\char`\_definition}\enskip
\texttt{define}\enskip
\\ &&
\texttt{new\char`\_inductive\char`\_definition}\enskip
\texttt{new\char`\_specification}\enskip
\\ &&
\texttt{new\char`\_type\char`\_definition}\enskip
\\
\emph{Isabelle/HOL:} &&
\texttt{abbreviation}\enskip
\texttt{axclass}\enskip
\texttt{coinductive}\enskip
\texttt{constdefs}\enskip
\texttt{consts}\enskip
\\ &&
\texttt{datatype}\enskip
\texttt{definition}\enskip
\texttt{defs}\enskip
\texttt{fun}\enskip
\texttt{function}\enskip
\texttt{inductive}\enskip
\\ &&
\texttt{inductive\char`\_set}\enskip
\texttt{nominal\char`\_datatype}\enskip
\texttt{nominal\char`\_inductive}\enskip
\\ &&
\texttt{nominal\char`\_primrec}\enskip
\texttt{primrec}\enskip
\texttt{recdef}\enskip
\texttt{record}\enskip
\texttt{specification}\enskip
\\ &&
\texttt{typedecl}\enskip
\texttt{typedef}\enskip
\texttt{types}\enskip
\\
\emph{Coq:} &&
\texttt{Definition}\enskip
\texttt{Fixpoint}\enskip
\texttt{Inductive}\enskip
\texttt{CoFixpoint}\enskip
\texttt{CoInductive}\enskip
\\ &&
\texttt{Record}\enskip
\texttt{Function}\enskip
\\
\emph{Mizar:} &&
\texttt{definition}\enskip
\end{tabular}
\end{flushleft}
%\texttt{func}\enskip
%\texttt{pred}\enskip
%\texttt{attr}\enskip
%\texttt{mode}\enskip
%\texttt{struct}\enskip
\subsubsection{N -- notation.}
These are the lines that direct the parser and
pretty-printer of the system.
These lines do not define the notions themselves, but introduce
the syntax for the defined notions.
\begin{flushleft}
\begin{tabular}{rcl}
\emph{HOL Light:} &&
\texttt{parse\char`\_as\char`\_infix}\enskip
\texttt{unparse\char`\_as\char`\_infix}\enskip
\texttt{parse\char`\_as\char`\_binder}\enskip
\\ &&
\texttt{make\char`\_overloadable}\enskip
\texttt{overload\char`\_interface}\enskip
\\ &&
\texttt{override\char`\_interface}\enskip
\texttt{reduce\char`\_interface}\enskip
\\ &&
\texttt{prioritize\char`\_num}\enskip
\texttt{prioritize\char`\_real}\enskip
\\
\emph{Isabelle/HOL:} &&
\texttt{syntax}\enskip
\texttt{translations}\enskip
\texttt{notation}\enskip
\texttt{nonterminals}\enskip
\\ &&
\texttt{parse\char`\_translation}\enskip
\texttt{print\char`\_translation}\enskip
\\
\emph{Coq:} &&
\texttt{Infix}\enskip
%\\ &&
\texttt{Notation}\enskip
%\\ &&
`\texttt{Reserved} \texttt{Notation}'\enskip
%\\ &&
`\texttt{Tactic} \texttt{Notation}'\enskip
\\ &&
\texttt{Coercion}\enskip
%\\ &&
`\texttt{Implicit} \texttt{Arguments}'\enskip
%\\ &&
`\texttt{Set} \texttt{Implicit} \texttt{Arguments}'\enskip
\\ &&
`\texttt{Unset} \texttt{Implicit} \texttt{Arguments}'\enskip
%\\ &&
`\texttt{Set} \texttt{Strict} \texttt{Implicit}'\enskip
\\ &&
`\texttt{Unset} \texttt{Strict} \texttt{Implicit}'\enskip
%\\ &&
`\texttt{Open} \texttt{Scope}'\enskip
%\\ &&
`\texttt{Open} \texttt{Local} \texttt{Scope}'\enskip
\\
\emph{Mizar:} &&
\texttt{notation}\enskip
\end{tabular}
\end{flushleft}
\subsection{Automation}
The automation of a system has two kinds of lines.
First there are the lines that set parameters for the
automated decision procedures and proof search procedures.
Second there are the implementations of these
automated procedures.
Most of the automation is implemented outside of
the formalizations and is not counted here, but procedures that are specific to
the subject are often implemented inside the formalization.
\subsubsection{H -- automation: directives.}
The automation `directives' often are mixed with statements.
For instance, in Isabelle theorem statements can be annotated with
`\texttt{[simp]}'.
This really is an automation directive, but it does not have a line
of its own, so it will not be reflected in the statistics for
this category of lines.
Similarly, the Mizar `registrations' (which direct the automation
of the Mizar type system) also can be read as statements.
For this reason in the Mizar pie chart on page~\pageref{pie} this
category was included in the group about statements, and not
in a group about automation.
\begin{flushleft}
\begin{tabular}{rcl}
\emph{Isabelle/HOL:} &&
\texttt{declare}\enskip
\texttt{lemmas}\enskip
\texttt{theorems}\enskip
\\
\emph{Coq:} &&
\texttt{Hint}\enskip
\texttt{Add}\enskip
\texttt{Opaque}\enskip
\texttt{Transparent}\enskip
\texttt{Scheme}\enskip
\\
\emph{Mizar:} &&
\texttt{registration}\enskip
\end{tabular}
\end{flushleft}
\subsubsection{X -- automation: program code.}
These lines are implementations of proof procedures.
In the HOL Light system really \emph{all} lines are in
some sense in this category, as a HOL Light formalization really
just is an OCaml program.
Therefore in the case of HOL Light the lines of this category are what remains
when the other categories are removed.
The Mizar system does not have this kind of line
as Mizar does not support user level proof automation.
\begin{flushleft}
\begin{tabular}{rcl}
\emph{HOL Light:} &&
\texttt{let}\enskip
\\
\emph{Isabelle/HOL:} &&
\texttt{ML}\enskip
\texttt{ML\char`\_setup}\enskip
\texttt{declaration}\enskip
\texttt{method\char`\_setup}\enskip
\texttt{oracle}\enskip
\texttt{setup}\enskip
\\ &&
\texttt{simproc\char`\_setup}\enskip
\\
\emph{Coq:} &&
\texttt{Ltac}\enskip
\end{tabular}
\end{flushleft}
\subsection{Theorems}
A formalization mainly consists of a long chain of `lemmas'.
These lemmas generally consist of a label, a statement and a proof.
\subsubsection{T -- theorem statements.}
In this category are the lines which state the theorem and give its
label.
The Mizar system actually distinguishes between two kinds of statements:
theorems and schemes.
The first category are the first order statements, while the
second category are the higher order statements.
Here we do not distinguish between these two categories.
\begin{flushleft}
\begin{tabular}{rcl}
\emph{HOL Light:} &&
\texttt{prove}\enskip
\texttt{prove\char`\_by\char`\_refinement}\enskip
\\
\emph{Isabelle/HOL:} &&
\texttt{lemma}\enskip
\texttt{theorem}\enskip
\texttt{inductive\char`\_cases}\enskip
\texttt{axioms}\enskip
\texttt{axiomatization}\enskip
\\ &&
\texttt{corollary}\enskip
\texttt{subclass}\enskip
\texttt{termination}\enskip
\\
\emph{Coq:} &&
\texttt{Lemma}\enskip
\texttt{Theorem}\enskip
\texttt{Goal}\enskip
\\
\emph{Mizar:} &&
\texttt{theorem}\enskip
\texttt{scheme}\enskip
\end{tabular}
\end{flushleft}
\subsubsection{P -- proofs.}
Finally there are the lines of the formalized proofs.
As is apparent in the pie charts in Section~\ref{pie}
these lines amount to about half of the formalizations.
These lines contain many different constructions all with
their own keywords.
Here we just give the keywords that bracket the proofs.
\begin{flushleft}
\begin{tabular}{rcl}
\emph{Isabelle/HOL:} &&
\texttt{apply}\enskip
\texttt{by}\enskip
\texttt{proof}\enskip
\texttt{qed}\enskip
\texttt{done}\enskip
\texttt{oops}\enskip
\texttt{sorry}\enskip
\\
\emph{Coq:} &&
\texttt{Proof}\enskip
\texttt{Qed}\enskip
\texttt{Save}\enskip
\texttt{Defined}\enskip
\\
\emph{Mizar:} &&
\texttt{proof}\enskip
\texttt{end}\enskip
\end{tabular}
\end{flushleft}
\section{Conclusions}\label{conclusions}
\subsection{Discussion}
The three main conclusions of this study for us are:
\begin{itemize}
\item
The four systems are quite similar.
Despite a large difference in foundations (the HOL logic, the Calculus of
Inductive Constructions and Tarski-Grothen\-dieck set theory are all quite
different) and in interaction style (talking to an OCaml interpreter,
interacting with a tactic prover in a Proof General style interface
and using a compiler-like batch checker), the actual formalizations
all share the same elements.
\item
The HOL Light system has the smallest definition segment in its pie chart.
This seems to suggest that it is the most reliable.
Andrzej Trybulec taught me in private communication that a definition is
like a \emph{debt}, because you do not know whether what you are defining
corresponds to the informal notion in your head.
You gain confidence in this by proving theorems about the notion later.
That way you pay the debt back, and gain trust in that your formalization
actually means what you think it means.
In this sense the HOL Light system is the most trustable of the four.
Another interpretation, proposed by John Harrison in a private communication,
is that the low percentage of the definitions does not so much reflect the quality
of the formalization but rather the fact that the HOL Light library primarily
contains pure mathematics.
This seems to be collaborated by the observation that the percentage of the definitions in John Harrison's verification work
at Intel, also using the HOL Light system, is 3.7\% instead of 1.3\%.
\item
The Mizar system has the largest proof segment in its pie chart.
This suggests that its proof language might be less efficient.
(It is very natural and pleasant to use, though.)
This might be related to Mizar's declarative proof style,
or the fact that the Mizar system does
not have much automation.
\end{itemize}
\subsection{Future work}
It might be interesting to delve into the `fine structure' of the
largest segment in the pie charts, the proof lines.
However, it probably is hard to systematically
distinguish different kinds of proof steps on a line by line basis.
An interesting question about the proof lines might be how many
are straight-forward `manual' reasoning steps, and how many invoke
strong automated proof procedures.
\paragraph{Acknowledgments.}
Thanks to John Harrison, Henk Barendregt and Makarius Wenzel for helpful comments.
Special thanks to John Harrison for sending me statistics on his
Intel verification work.
Special thanks to Makarius Wenzel for sending me a list of
categorized Isabelle keywords.
%\bibliographystyle{plain}
%\bibliography{frk}
\begin{thebibliography}{1}
\bibitem{har:00}
J.R. Harrison.
\newblock {\em The HOL Light manual (1.1)}, 2000.
\newblock \url{}.
\bibitem{muz:93}
Micha{\l} Muzalewski.
\newblock {\em An Outline of PC Mizar}.
\newblock Fondation Philippe le Hodey, Brussels, 1993.
\newblock
\url{}.
\bibitem{nip:pau:wen:02}
T.~Nipkow, L.C. Paulson, and M.~Wenzel.
\newblock {\em Isabelle/HOL --- A Proof Assistant for Higher-Order Logic},
volume 2283 of {\em LNCS}.
\newblock Springer, 2002.
\newblock
\url{}.
\bibitem{coq:06}
{The Coq Development Team}.
\newblock {\em The Coq Proof Assistant Reference Manual}, 2006.
\newblock \url{}.
\bibitem{wie:06}
Freek Wiedijk, editor.
\newblock {\em {The Seventeen Provers of the World}}, volume 3600 of {\em
LNCS}.
\newblock Springer, 2006.
\newblock With a foreword by Dana S.~Scott.
\end{thebibliography}
\end{document}