September 23rd, 2009
A new parser for Ioke
Last week I finally bit the bullet and rewrote the Ioke parser. I’m pretty happy about the end result actually, but it does involve moving away from Antlr’s as a parser generator. In fact, the new parser is handwritten – and as such goes against my general opinion to generate everything possible. I would like to quickly take a look at the reasons for doing this and also what the new parser will give Ioke.
For reference, the way the parser used to work was that the Antlr generated lexer and parser gave the Ioke runtime an Antlr Tree structure. This tree structure was then walked and transformed into chained Message’s, which is the AST that Ioke uses internally. Several other things were also done at this stage, including separating message chains on comma-borders. Most significantly the processing to put together interpolated strings and regular expressions happened at this stage. Sadly, the code to handle all that was complex, ugly, slow and frail. After this stage, operator shuffling happened. That part is still the same.
There were several problems I wanted to solve, but the main one was the ugliness of the algorithm. It wasn’t clear from the parser how an interpolated expression mapped into the AST, and the generated code added several complications that frankly weren’t necessary.
Ioke is a language with an extremely simple base syntax. It is only slightly more complicated than the typical Lisp parser, and there is almost no parser-level productions needed. So the new parser does away with the lexer/parser distinction and does everything in one pass. There is no need for lookahead at the token level, so this turns out to be a clear win. The code is actually much simpler now, and the Message AST is created inline in the new parser. When it comes to interpolation, instead of the semantic predicates and global stacks I had to use in the Antlr parser, I just do the obvious recursive interpolation. The code is simple to understand and quite efficient too.
At the end of the day, I did expect to see some performance improvements too. They turned out to be substantial. Parsing is about 2.5 times faster, and startup speed has improved by about 30%. The distribution size will be substantially smaller since I don’t need to ship the Antlr runtime libraries. And building the project is also much faster.
But the real gain is actually in maintainability of the code. It will be much easier for me to extend the parser now. I can do nice things to make the syntax more open ended and more powerful in ways that would be very inconvenient in Antlr. The error messages are much better since I have control over all the error states. In fact, there are only 13 distinct error messages in the new parser, and they are all very clear on what has gone wrong – I never did the work in the old parser to support that, but I get that almost for free in the new one.
Another thing I’ve been considering is to add reader macros to Ioke – and that would also have been quite painful with the Antlr parser generator. So all in all I’m very happy about the new parser, and I think it will definitely make it easier for the project going forward.
This blog post is in no way saying that Antlr is bad in any way. I like Antlr a lot – it’s a great tool. But it just wasn’t the right tool for Ioke’s syntax.