Date: 2011-12-01
Categories: software; parsing

Verified parsing

The following post describes work in 2010/2011 on verified parsing. It includes a verification (in HOL4) of a sound and complete combinator parsing library.

Verified parsing

During 2010/2011 I developed a verified parser for all context free languages. The main features of my approach are:

The other thing you should consider is how fast you need your parser to be. For highly ambiguous grammars, this approach seems competitive with Happy etc., but for very simple grammars, more specialized technology will probably perform better (as you should expect).

Cool example

A cool-but-stupid example that I quite like is the following:

{{

(* A very stupid way to count the length of the input. *)

}}

START -> E ?ws? ?EOF? {{ fun (i,_) -> print_string ((string_of_int i)^"\n"); i }}

E -> E E E   {{ fun (x,(y,z)) -> x+y+z }}
  | "a"      {{ fun _ -> 1 }}
  | ""       {{ fun _ -> 0 }}

This grammar is examples/actions/length.g in the supporting material. You can use the parser generator to produce a parser length.native. You can then run it as follows:

examples/actions$ ./length.native -f length.txt

Here, length.txt is a file containing 30 consecutive 'a' characters. Note that this grammar is extremely ambiguous. Even so, on my relatively modest laptop this returns a single result 30 in half a second. I don't know any other parser generator that can do anything like this!

Paper

The paper on all this is

Tom Ridge. Simple, functional, sound and complete parsing for all context-free grammars. In Jean Pierre Jouannaud and Zhong Shao, editors, Certified Programs and Proofs - First International Conference, CPP 2011, Kenting, Taiwan, December 7-9, 2011. Proceedings, volume 7086 of Lecture Notes in Computer Science, pages 103--118. Springer, 2011.

A pdf version can be found here.

Guide to supporting material

There is quite a lot of supporting material.

The supporting material is https://github.com/tomjridge/2012-11-29_parsing.


Related posts: