\documentclass[runningheads]{llncs}
\usepackage{url}
\usepackage{hyperref}
\begin{document}

\title{
	A \underline{Formal}ization of the C99 Standard \\ 
	\underline{in} HOL, Isabelle and Coq
}
\titlerunning{A Formalization of the C99 Standard in HOL, Isabelle and Coq}
\toctitle{A Formalization of the C99 Standard in HOL, Isabelle and Coq}
\author{Robbert Krebbers \and Freek Wiedijk}
\institute{
	Institute for Computing and Information Sciences \\
	Radboud University Nijmegen \\
	Heyendaalseweg 135, 6525 AJ Nijmegen, The Netherlands
}
\maketitle

\begin{abstract}
\parfillskip=1em plus 1fil
We recently started the Formalin project to create a formal version of the C99 
standard for the C programming language. We are writing three matching 
formalizations for the interactive theorem provers HOL4, Isabelle/HOL and Coq, 
that all closely follow the existing C99 standard text. The project runs from 
2011 to 2015, and involves a full time PhD student, a half time researcher and 
several scientific advisors.

The project differs from existing work in that our aim is to formalize the 
\emph{full} C99 standard. This means that we treat the C preprocessor, the C 
standard library, floating point arithmetic, and `dirty' C features like signal 
handling and volatile variables. Importantly, this means we also treat embedded 
C programs without explicit input/output.
\end{abstract}

\noindent
The Formalin project~\cite{wie:10:1}, with website \,\url{http://ch2o.cs.ru.nl/},
runs from May 2011 to May 2015. The research team consists of the following 
people:
\begin{center}
	\setlength{\tabcolsep}{.8em}
	\begin{tabular}{lll}
	Robbert Krebbers & \emph{PhD student} \\
	Freek Wiedijk & \emph{project leader} \\
	%\noalign{\medskip}
	Herman Geuvers & \emph{promotor} \\
	James McKinna & \\
	Erik Poll & \\
	%\noalign{\medskip}
	Michael Norrish & \emph{HOL advisor} & {NICTA, Australia} \\
	Andreas Lochbihler & \emph{Isabelle advisor} & {KIT, Germany} \\
	Jean-Christophe Filli\^atre & \emph{Coq advisor} & {CNRS, France}
\end{tabular}
\end{center}
The first five people are from the Radboud University in The Netherlands.
The first two people are the developers, while the other six are advising.
\bigskip

\noindent
The C programming language~\cite{ker:ric:88} is one of the most popular in the 
world. It is among the two currently most popular languages \cite{lan:xx,tio:xx},
and is a dominant language from the smallest microcontroller with only a few 
hundred bytes of RAM to the largest supercomputer that runs at petaflops speeds.
C is especially used for embedded software, but it is also the native language 
of most modern operating systems due to its tight connection to Unix.

The current official description of the syntax and semantics of the C language 
-- the C99 standard~\cite{iso:99}, issued by ANSI and ISO together -- is written 
in English and does not use a mathematically precise formalism. This makes it 
inherently incomplete and ambiguous. Our project is to create a mathematically 
precise version of the C99 standard. We \emph{formalize} the standard using 
interactive theorem provers (proof assistants) and develop the C99 
formalization in three matching versions, for the interactive theorem provers 
HOL4~\cite{gor:mel:93}, Isabelle/HOL~\cite{nip:pau:wen:02} and Coq~\cite{coq:10}. 
These formalizations will all be derived from a common master formalization,
which will either be written for one of the three systems or using a fourth 
system like for instance the Ott tool~\cite{sew:nar:owe:pes:rid:sar:str:10}.
Ott is a tool designed for defining language definitions, which can generate 
formalizations for all three of our systems. The whole formalization suite will 
be published as open source, under a BSD-style license. For dissemination, we 
will also use MKM tools like the ones that currently are being developed in the MathWiki 
project~\cite{mat:xx}.

The formalizations that we create closely follow the existing C99 
standard text. Specifically we treat the C preprocessor, the C standard 
library, and features that in a formal treatment are often left out:
unspecified 
and undefined behavior due to unknown evaluation order, casts between pointers 
and integers, alignment requirements, floating point arithmetic, non-local 
control flow (\textsf{goto} statements, \textsf{setjmp}/\textsf{longjmp}
and signal handling) and volatile variables. Most importantly, to make our work 
relevant for verification of embedded software, we include % in our work 
the part of the standard referring to C programs that run in a `freestanding 
environment' (Section 5.1.2.1 of~\cite{iso:99}).

A formal version of the full C standard is an important artifact.
When establishing a property of a C program, it is
very attractive to be able to claim that it has been proved
with respect to the \emph{full} official standard.
This kind of `knowledge' about the C semantics in the current state of the art
is mostly implicit in various tools.

A formalization of the C99 standard has three main applications:
\begin{itemize}
\item The C99 formalization makes the C99 standard utterly precise. This 
	is useful for compiler writers, who will get the means to establish how 
	the standard needs to be understood without having to deal with the 
	ambiguities of the English language. Programmers writing C programs get the 
	same benefit.
\item There already are various projects to prove C compilers correct, like the 
	Compcert project of Xavier Leroy~\cite{bla:ler:09}. These projects 
	need a semantics of a version of C. These currently are subsets of full C, 
	with names like Clight or C0. With a formal version of the C99 semantics, 
	the correctness of the compiler becomes provable with respect to the full 
	\emph{official} standard. 
\item Currently people proving C programs correct with proof assistants use 
	tools like VCC~\cite{coh:ah:hil:lei:mos:san:sch:tob:09} and Frama-C~\cite{moy:mar:09}
	which generate \emph{verification conditions} from C source annotated in the 
	style of Hoare logic. These tools implicitly `know' about the semantics of C, 
	but this knowledge is not explicit. A more thorough approach is to have 
	such a tool not just generate the verification conditions, but to also have 
	it synthesize formal \emph{proofs} about the properties of the program.
	%This will be less efficient, computationally, but it will be fundamentally 
	% more reliable.
\end{itemize}

\noindent
Our three formalizations each consist of two parts.
The first part defines a space of all possible C 
semantics as a type \texttt{C\char`\_semantics}.
(`Semanticses' is not correct English, but we mean the plural of 
semantics here.)
Points in this space correspond to various variants of C like the
C99 standard,
the upcoming C1X standard, and the behavior of specific C compilers on 
specific machines.
The definition of this space will be as short as we can make it,
to have it as clear as possible what our formalized C99 standard 
amounts to.
This space also addresses the observation from page 70 
of~\cite{bes:blo:che:cho:ful:hal:hen:kam:mcp:eng:10} that `{the} C language does 
not exist'.
The second part is a {small step} structured operational semantics of C.
It corresponds to a \emph{point} in the space of C semantics,
which means that the second and main part of our formalizations will be a
formal definition of an element
\[
	\texttt{C99 : C\char`\_semantics}
\]

\bibliographystyle{plain}
\bibliography{../CH2O}

\end{document}
