Date: 2014-12-19
Categories: parsing

Parsing the IMAP protocol

There was a recent discussion on the OCaml mailing list about parsing IMAP http://article.gmane.org/gmane.comp.lang.caml.inria/61731. The apparent need is for a scannerless, incremental parser.

I'm not quite clear what people mean by incremental parsing. There are probably several things that should be distinguished:

Anyway, Earley parsing can probably support all these use cases quite nicely, and E3 is already scannerless, so I thought I would try my hand at producing an IMAP protocol parser.

The IMAP protocol is defined in RFC 3501, using a variant of BNF. The variant is defined in RFC 2234. I first defined a parser for ABNF https://github.com/tomjridge/example_grammars/blob/master/src/abnf_rfc2234/abnf_grammar.p1x. With this I was able to parse the IMAP protocol directly from RFC 3501. The grammar is here, and the result of parsing this grammar is an s-expression here.

The next step is to generate parsing code that can parse this grammar and produce abstract syntax trees corresponding to the expressions in the IMAP protocol.


Related posts: