Date: 2018-05-22
Categories: research; programming; parsing
First Python program: an Earley parser!
Next semester we are introducing Python as the first language we teach to our students. To learn Python I read a few books, then started to write some code. My first piece of code involved translating a piece of OCaml (https://github.com/tomjridge/tjr_simple_earley) to Python.
The result is https://github.com/tomjridge/tjr_python_earley_parser
From the README.md:
This code is based on the OCaml implementation https://github.com/tomjridge/tjr_simple_earley but adapted to Python. Documentation for the algorithm itself (not this implementation in Python) can be found at that link.
The OCaml code is much faster (as should be expected).
Even so, the Python code is not slow. For example, for an input
"1"*100, and a grammar E -> E E E | "1" | eps
, parsing takes about
.75 of a second. This is notable because most other Earley parsers
tend to fall over when given a grammar like this.
This is my first piece of Python code, and I was pleasantly surprised:
- pycharm is a nice environment for programming Python
- the checks that pycharm does ("inspect code") are useful
- I didn't miss types too much (of course, I am only translating from OCaml, rather than developing the algorithm from scratch... types would likely be much more useful when developing the algorithm).
- Python has nice support for sets, maps, and algebraic datatypes (namedtuples).
- Imperative/mutable programming actually makes the code nicer compared to the state-passing style used for the OCaml implementation
TODO:
- compare performance with other Python Earley parsers, such as https://github.com/lark-parser/lark
UPDATE
Inspired by imperative programming (!) I attempted to replicate the Python experience in OCaml, using a monad. The result is:
This actually is pretty readable, and close to the concision that Python allows.
Related posts:
- 2020-05-01 John Whitington PhD viva
- 2020-02-05 On the need for PhD viva chairs
- 2020-01-20 SQLite assumptions, or how to corrupt an SQLite database
- 2019-12-18 VeTSS annual summary
- 2019-08-30 B-tree random write performance
- 2019-08-21 ML'19 Workshop at ICFP: A key-value store for OCaml
- 2018-07-02 Funded PhD places
- 2018-06-14 A typed DSL for parsing
- 2018-05-30 Potential improvements in filesystem performance
- 2018-05-22 First Python program: an Earley parser!
- 2018-02-01 New OCaml library: path resolution
- 2017-11-14 New OCaml parsing algorithm: tjr_simple_earley
- 2017-09-17 Two new OCaml libraries: P0 and tjr-csv
- 2017-09-06 ICFP most influential paper from 10 years ago
- 2017-03-16 tjr-btree: a CoW B-tree library in OCaml
- 2016-02-19 Tree-structured text
- 2016-02-09 Simple implementation of an Earley-like parsing algorithm
- 2015-06-26 P5 scala parsing library
- 2015-04-27 Why operational models?
- 2014-12-19 Parsing the IMAP protocol
- 2014-12-04 Parsing examples
- 2014-11-26 Isabelle on 64bit ubuntu with 32bit libraries
- 2014-11-21 Talk on parsing at the University of Sussex
- 2014-09-26 P1 combinator parsing library for OCaml
- 2014-09-26 E3 earley parser library for OCaml
- 2014-09-18 SLE 2014 conference, and Parsing at SLE workshop, slides
- 2014-09-07 ICFP 2014, OCaml workshop, slides and video
- 2014-07-11 P3 paper accepted for SLE 2014
- 2014-04-15 New release of P3 code on github
- 2014-03-02 New release of P3 code on github
- 2013-12-16 New release of P3 code on github
- 2013-12-03 Implementing algorithms efficiently
- 2013-11-08 Talk on parsing and P3 given at Cambridge
- 2011-12-01 Verified parsing