Date: 2011-12-01
Categories: software; parsing

Verified parsing

The following post describes work in 2010/2011 on verified parsing. It includes a verification (in HOL4) of a sound and complete combinator parsing library.

Verified parsing

During 2010/2011 I developed a verified parser for all context free languages. The main features of my approach are:

The other thing you should consider is how fast you need your parser to be. For highly ambiguous grammars, this approach seems competitive with Happy etc., but for very simple grammars, more specialized technology will probably perform better (as you should expect).

Cool example

A cool-but-stupid example that I quite like is the following:

{{

(* A very stupid way to count the length of the input. *)

}}

START -> E ?ws? ?EOF? {{ fun (i,_) -> print_string ((string_of_int i)^"\n"); i }}

E -> E E E   {{ fun (x,(y,z)) -> x+y+z }}
  | "a"      {{ fun _ -> 1 }}
  | ""       {{ fun _ -> 0 }}

This grammar is examples/actions/length.g in the supporting material. You can use the parser generator to produce a parser length.native. You can then run it as follows:

examples/actions$ ./length.native -f length.txt

Here, length.txt is a file containing 30 consecutive 'a' characters. Note that this grammar is extremely ambiguous. Even so, on my relatively modest laptop this returns a single result 30 in half a second. I don't know any other parser generator that can do anything like this!

Paper

The paper on all this is

Tom Ridge. Simple, functional, sound and complete parsing for all context-free grammars. In Jean Pierre Jouannaud and Zhong Shao, editors, Certified Programs and Proofs - First International Conference, CPP 2011, Kenting, Taiwan, December 7-9, 2011. Proceedings, volume 7086 of Lecture Notes in Computer Science, pages 103--118. Springer, 2011.

A pdf version can be found here.

Guide to supporting material

There is quite a lot of supporting material.

The supporting material is https://github.com/tomjridge/2012-11-29_parsing.



Related articles:

  • 2017-11-14 New OCaml parsing algorithm: tjr_simple_earley
  • 2017-09-17 Two new OCaml libraries: P0 and tjr-csv
  • 2017-03-16 tjr-btree: a CoW B-tree library in OCaml
  • 2016-11-17 OCaml string functions
  • 2016-02-19 Tree-structured text
  • 2016-02-09 Simple implementation of an Earley-like parsing algorithm
  • 2015-06-26 P5 scala parsing library
  • 2014-12-19 Parsing the IMAP protocol
  • 2014-12-04 Parsing examples
  • 2014-11-21 Talk on parsing at the University of Sussex
  • 2014-09-26 P1 combinator parsing library for OCaml
  • 2014-09-26 E3 earley parser library for OCaml
  • 2014-09-18 SLE 2014 conference, and Parsing at SLE workshop, slides
  • 2014-09-07 ICFP 2014, OCaml workshop, slides and video
  • 2014-07-11 P3 paper accepted for SLE 2014
  • 2014-04-15 New release of P3 code on github
  • 2014-03-02 New release of P3 code on github
  • 2013-12-16 New release of P3 code on github
  • 2013-12-03 Implementing algorithms efficiently
  • 2013-11-24 Experience of using Lem
  • 2013-11-08 Talk on parsing and P3 given at Cambridge
  • 2011-12-01 Verified parsing