The "de Bruijn factor"
In "A survey of the project Automath" N.G. de Bruijn wrote:
A very important thing that can be concluded from all writing experiments is the constancy of the loss factor. The loss factor expresses what we loose in shortness when translating very meticulous "ordinary" mathematics into Automath. This factor may be quite big, something like 10 or 20, but it is constant: it does not increase if we go further in the book. It would not be too hard to push the constant factor down by efficient abbreviations.
Because of this quote, I defined the de Bruijn factor to be the quotient of the size of a formalization of a mathematical text and the size of its informal original. To make this specific, I made the following choices:
- The size of the informal text is the size of a TeX encoding.
- To be independent of arbitrary factors like lengths of identifiers and amount of whitespace, the files should be compared compressed with the Unix utility gzip.
I investigated the exact value of de Bruijn factor in a number of specific cases:
Arithmetic in Automath
grundlagen.tex.gz x 3.7 = grundlagen.aut.gz
(Here are the ps.gz, dvi, tex and Automath versions of this text, E. Landau, "Grundlagen der Analysis", Chelsea Publishing Company, New York, 1965, first edition: 1930.)
Computer Science in Mizar
fin_topo.tex.gz x 3.1 = fin_topo.miz.gz
(Here are the ps.gz, dvi, tex and Mizar versions of this text, a section of: Y. Nakamura, Y. Fuwa, and H. Imura, "A Theory of Finite Topology and Image Processing", Journal of the Faculty of Engineering, Shinshu University, 69:11-24, 1991.)
Logic in Mizar
waybel14.tex.gz x 4.1 = waybel14.miz.gz
(Here are the ps.gz, dvi, tex and Mizar versions of this text, four pages from: G. Gierz, K.H. Hofmann, K. Keimel, J.D. Lawson, M. Mislove and D.S. Scott, "A Compendium of Continuous Lattices", Springer-Verlag, Berlin, Heidelberg, New York, 1980.)
Analysis in HOL
cotangent1.tex.gz x 3.7 = cotangent1.ml.gz
(Here are the ps.gz, dvi, tex and HOL versions of this text, three pages from: K. Knopp, "Theory and Application of Infinite Series", Blackie & Son, London, Glasgow, 1928.)
Topology in HOL
stone.tex.gz x 3.7 = stone.ml.gz
(Here are the ps.gz, dvi, tex and HOL versions of this text, one page from: R. Engelking, "General Topology", Heldermann Verlag, Berlin, 1989.)
Algebra in HOL
pell.tex.gz x 4.1 = pell.ml.gz
(Here are the ps.gz, dvi, tex and HOL versions of this text, four pages from: D. Cohen, "Computability and Logic", Ellis Horwood Ltd., Chischester, 1987.)
Logic in HOL and Isar
multiwf1.tex.gz x 1.3 = multiwf1.ml.gz
multiwf1.tex.gz x 1.1 = multiwf1.thy.gz
(Here are the ps.gz, dvi, tex, HOL and Isar versions of this text, one page from: An Inductive Proof of the Wellfoundedness of the Multiset Order.)
Topological analysis in HOL
monotoneity1.tex.gz x 8.8 = monotoneity.ml.gz
(Here are the pdf, ps.gz, dvi, tex and HOL versions of this text, and the pdf of the original page from: G.T. Whyburn, "Topological Analysis".)
Number theory in HOL
covering.tex.gz x .44 = covering.ml.gz
(Here are the pdf, ps.gz, dvi, tex and HOL versions of this text, one page from: Melvyn B. Nathanson, "Additive Number Theory, the Classical Bases".)
Apparently people generally select texts for formalization that are matched to their system in difficulty such that the de Bruijn factor becomes around four.
I did a poster session about the de Bruijn factor at the TPHOLs 2000 conference (here are the ps.gz, dvi and tex versions of the text that I submitted there.)
(last modification 2012-03-01)
|