Date: 2018-05-22
Categories: research; programming; parsing

First Python program: an Earley parser!

Next semester we are introducing Python as the first language we teach to our students. To learn Python I read a few books, then started to write some code. My first piece of code involved translating a piece of OCaml (https://github.com/tomjridge/tjr_simple_earley) to Python.

The result is https://github.com/tomjridge/tjr_python_earley_parser

From the README.md:

This code is based on the OCaml implementation https://github.com/tomjridge/tjr_simple_earley but adapted to Python. Documentation for the algorithm itself (not this implementation in Python) can be found at that link.

The OCaml code is much faster (as should be expected).

Even so, the Python code is not slow. For example, for an input "1"*100, and a grammar E -> E E E | "1" | eps, parsing takes about .75 of a second. This is notable because most other Earley parsers tend to fall over when given a grammar like this.

This is my first piece of Python code, and I was pleasantly surprised:

pycharm is a nice environment for programming Python
the checks that pycharm does ("inspect code") are useful
I didn't miss types too much (of course, I am only translating from OCaml, rather than developing the algorithm from scratch... types would likely be much more useful when developing the algorithm).
Python has nice support for sets, maps, and algebraic datatypes (namedtuples).
Imperative/mutable programming actually makes the code nicer compared to the state-passing style used for the OCaml implementation

TODO:

compare performance with other Python Earley parsers, such as https://github.com/lark-parser/lark

UPDATE

Inspired by imperative programming (!) I attempted to replicate the Python experience in OCaml, using a monad. The result is:

This actually is pretty readable, and close to the concision that Python allows.