ThoughtWorks Seminar and Tutorial in Stockholm

September 29th, ThoughtWorks will hold a day of seminars and a tutorial in Stockholm, Sweden. The seminars are free. I will talk about alternative languages, Martin Fowler will talk about software design in the 21st century, and another ThoughtWorks speaker will talk about DSLs for functional testing.

The tutorial is a half day tutorial given by Martin Fowler and me. We will talk about domain specific languages.

If this sounds interesting, go in and find more information and register here. Hurry, though – places are limited!

Hijacking Ioke syntax

One of the nicer things with Ioke is that the syntax is highly flexible. This means that features that are generally considered part of the parsing step is not so in Ioke. There are still limitations on what you can take over, of course.

Another reason you can do much with Ioke’s syntax is that all operators are just method calls. This means you can override them in subclasses, which means these syntax changes can be totally localized.

A third way you can change things is by changing things at a global level, but only temporarily. This is possible since any cell in Ioke can act as a dynamic cell (or special) – by using the “let” method. This means you can make some very interesting changes to the syntax.

Understanding these three techniques make it possible to very easily create internal DSLs in Ioke, that feel like they are external. A fourth way of achieving this can be to massage message chains after the fact, to transform them into a different structure. You can do this directly, by working with the first class message structures – or if you don’t have to extravagant needs, you can interject specific behavior into the operator tables.

This post will give a quick introduction to these techniques, but it can’t really cover them all in full.

Flexible and polymorphic syntax elements

The things that in Ioke look like syntax elements but are mostly just message sends allow you to change the behavior locally of some things. Some examples of things where this is possible is in the creation of literals (such as numbers, texts and regexps), regular cell assignment (with =), and the creation of literal lists and dicts.

OK, to make this concrete, let me show a few examples. To start with, take the literals. The way literals work is that they will actually be translated into message sends. So when the parser see a number, it will generate a message send to “internal:createNumber”, and insert that into the message chain. This means you can override and change this behavior, which is something I do with my parser combinators example. As an extremely small example, take this code:

  "foo" | "bar" | 42

This example creates a new parser that will parse either the literal string foo, the literal string bar, or the number 42. But how can we implement this? The method “|” is not even implemented for Texts in Ioke, and they definitely doesn’t return anything useful for the parser. We don’t want to override it to return the right thing either – it wouldn’t be a general solution – and what if someone wanted a parser that only matched one literal string? It’s clear that we need to hook into the handling of literals. (There is an alternative, we will talk about that in the final piece).

At this stage it might help to take a look at the canonical AST structure of the above code. It would look like this: internal:createText(“foo”) |(internal:createText(“bar”)) |(internal:createNumber(“42″)). With this structure, it should be more obvious how we can implement everything. The code for the above would look like this:

BaseParser = Origin mimic do(
  | = method(other,
    OrParser with(context: context, first: self, second: other)

TextParser = BaseParser mimic
NumberParser = BaseParser mimic

ParserContext = Origin mimic do(
  internal:createText   = method(raw,
    TextParser with(context: self, text:   super(raw)))
  internal:createNumber = method(raw,
    NumberParser with(context: self, number: super(raw)))

Parser = dmacro(
  context = ParserContext mimic
  code evaluateOn(context, context)

The interesting pieces are in ParserContext. Inside it we override internal:createText and internal:createNumber to return parsers for their corresponding type. Notice how we call out to "super" to get the actual literal result. We then evaluate the argument to the Parser-method in the context of a newly created parser context. The only thing missing in the above code is the OrParser, and the actual matching pieces.

The other ways of hijacking syntax generally depend on executing code in a specific context, like the above.

I mentioned that "=", [] and {} are overridable. Say that you for example like the syntax of blocks in Smalltalk, and want to use that within an internal DSL. That is actually extremely easy:

Smalltalk = Origin mimic do(
  [] = cell(:fn)

Smalltalk do(
  x = ["hello world" println]
  x call
  x call

Here we just assign [] to be the same as the method "fn" within the Smalltalk object. The same thing can be done for other operators, if wanted.

Using let to override syntax

As mentioned above, you can use the let method to override syntax (or any method really) for a specific bounded time. Lets say we want to do the above operation (Smalltalk blocks) for the duration of some execution. We can do it like this:

let(DefaultBehavior Literals cell("[]"), cell(:fn),
  x = ["hello world" println]
  x call
  x call

This will override the cell specified in the first argument with the value in the second argument - and then restore it at the end of the let-method. This is really not recommended for something like the [] method - as it will cause all kinds of problems in the internal implementations of methods. But you can definitely do it.

Transforming message chains

There are basically two techniques here. The first one is simply to add or remove or update the operator table
so that you can add operators that weren't there before, or change the way they behave. This is not restricted to things with weird characters in them - indeed, in Ioke anything that appears in the operator tables count as an operator. A specific example of this is "return". It can act as a unary operator, meaning it is possible to give return an argument without having parenthesis surrounding that argument.

To find out more information you can take a look at the documentation for Message OperatorTable here:

The other way of transforming message chains is to actually take them apart and put them together again. I am planning a blog post dedicated to how to work with this, but I'll take a quick peek at how to do it now. Lets take the earlier Parser example and see an alternative way of creating the text and number parsers without actually overriding the literals creation.

reformatCodeForParsing = method(code,
  ourCode = 'nil
  head = ourCode
  code each(msg,
    case(msg name,
      :internal:createText, ourCode -> ''createTextParser('msg),
      :internal:createNumber, ourCode -> ''createNumberParser('msg),
      :"|", ourCode -> ''withOrParser('(reformatCodeForParsing(msg arguments[0]))),
      else, ourCode -> msg mimic
    ourCode = ourCode last

Parser = dmacro(
  reformatCodeForParsing(code) evaluateOn(Ground, Ground)

This code is actually a bit annoying, but what it does is quite useful. What it does is that it will take the code above ("foo" | "bar" | 42) and restructure that into (createTextParser("foo") withOrParser(createTextParser("bar")) withOrParser(createNumberParser(42))).

The only thing that is a bit inscrutable is the way new message chains are put together using quoting of different kinds. I'm going to go into this with more depth in the next blog post. I am also working on a tree rewriting approach for making these kind of transformations much more idiomatic and readable.

So, this post have been a small introduction to several things you can do with Ioke to tweak its syntax. There is much more behind all these features, of course, and they all come from the fact that Ioke tries to unify and simplify all concepts as much as possible.