Preannouncing Seph


I’ve been dropping a few hints and mentions the last few weeks, and I thought it was about time that I took some time to preannounce a new project I’m working on. It’s going to be much easier writing my next few blog posts if people already know about the project, and my reasons for keeping quiet about it have mostly disappeared. It’s also a moot point since I talked about it at the Emerging Languages camp last week, and the video will be up fairly soon. And I already put the slides online for it, so some things you might have already seen.

So without further ado, the big announcement is that I’m working on a new language called Seph. Big whoop.

Why?

I already have Ioke and JRuby to care for, so it’s a very valid question to ask why I would want to take on another language project – outside my day job of course. The answer is a bit complicated. I always knew and communicated clearly that Ioke was an experiment in all senses of the word. This means my hope was that some of the quirky features of Ioke would influence other languages. But the other side of it is that if Ioke seems good enough as an idea, there might be value in expanding and refining the concept to make something that can be used in the real world. And that is what Seph is really about. That blog post I wrote a few weeks ago with the Ioke retrospective – that was really a partial design document for Seph.

So the purpose of Seph is to take Ioke into the real world while retaining enough of what made Ioke a very nice language to work with. Of course, being the person I am, I can’t actually avoid doing some new experimentation in Seph, but they will be mostly a bit safer than the ones in Ioke, and some of the craziest Ioke features have been scaled back a bit.

Some features

So what’s the difference? Seph will still be prototype based object oriented, in the same way as Ioke. It will definitely consider the JVM its home. It will be homoiconic, and allow AST manipulation as a first class concept – including working with message chains as a way of replacing blocks. It will still have a numerical tower. It will use almost exactly the same syntax as Ioke. It will still allow you to customize operators and precedence rules.

The big difference. The one that basically makes most all other design changes design themselves is a small but very important difference: objects are immutable. The only way you can create new objects is by creating a new object that specifies the difference. This can be done either by creating a new child of the existing object, or creating a new sibling with the specified attributes changed. In most cases, the difference between the strategies isn’t actually visible, so it might be an implementation strategy.

Now once you have immutable objects but still focus on polymorphic dispatch, that changes quite a lot of things. It changes the core data structures, it changes the way macros work, it changes the flow of data in general. It also changes what kinds of optimizations will be possible.

Another side effect of immutability is that is becomes much more important to have a good module story. Seph will have first class modules that ends up still being simple Seph objects at the same time. It’s really a quite beautiful and simple scheme, and it makes total sense.

If you’re creating a new Object Oriented language, it turns out that proper tail calls is a good idea if you can do it (refer to Steele for more arguments). Seph will include proper TCO for all Seph code and all participating Java code – which means you’ll only really grow the stack when passing Java boundaries. This will currently be done with trampolining, but I deem the cost worth the benefit of a tail recursive programming style.

I mentioned above that objects are immutable. However, local variables will be mutable. It will also be possible to create lexical closures. I’m still undecided whether it’s a good idea to leave a big mutable hole in the tyoe system, or whether I should make it impossible for lexical closures to mutate their captured environment. Time will tell what I decide.

Stealing is good

Seph believes in reusing concepts other people have already made a great job with. As such, many pieces of the language implementation will be stolen from other places.

Just like in Ioke, the core numbers will come from gnu.math. This library has served me well, and I’ll definitely continue to use it. The big difference compared to Ioke is that the gnu.math values will be first class Seph object, and won’t have to be wrapped. Seph will also have real floats instead of bigdecimals. This is a concession to reality (which I don’t much like, btw).

Seph will incorporate Erlang style light weight threads with an implementation based on Kilim (just like Erjang).

As mentioned above, the core data structures will have to change. And the direction of change will be towards Clojure. Specifically, Seph will steal/has stolen Clojures persistent data structures, all the concurrency primitives and the STM. I don’t see any reason to not incorporate fantastic prior art into Seph.

As mentioned above, the module system is also not new – it’s in fact heavily inspired of Newspeak. Having no globals force this kind of thinking, but I can’t say I would have been clever enough to think of it without Gilad’s writings, though.

Basically everything else is copied from or inspired by Ioke.

Isn’t mutability the essence of Ioke?

If you have worked with Ioke, or even heard me talk about it, you might have gotten the impression that mutability is one of the core tenets of Ioke. And your impression would be correct. It wasn’t until I started thinking about what a functional object hybrid version of Ioke would look like, that I realized most of things I like in Ioke could be preserved without mutability. Most of the macros, the core evaluation model and many other pieces will be extremely similar to Ioke. And this is a good thing. I think Ioke has real benefits in terms of both power and readability – something that is not easy to combine. I want Seph to bring the same advantages.

Will you abandon Ioke now?

In one word: no. Ioke is still an experiment and there are still many things that I want to explore with Ioke. Seph will not fill the same niche, it won’t be possible for me to do the same experimentation, and fundamentally they are still quite different languages. In fact, you should expect an Ioke release hopefully within a few weeks.

So will it be useful?

Yes. That’s the whole goal. Seph will have an explicit focus on two areas that Ioke totally ignored. These areas are concurrency and performance. As seen from the features above, Seph will include several powerful concurrency features. And from a performance standpoint, Ioke was a tarpit – even if you wanted to make it run faster, there wasn’t really anything to get a handle on. Seph will be much easier to optimize, it’s got a structure that lends itself better to compilation and I expect it to be possible to get it to real world language performance. My current idea is that I should be able to get it to JRuby performance for roughly the same operations – but that might be a bit optimistic. I think it’s in the right ballpark though. Which means you should be able to use it to implement programs that actually do useful things in the Real World ™.

Is it available?

No. At the current point, Seph is still so young I’m going through lots of rewrites. I would like the core to settle down a little bit before exposing everything. (Don’t worry, everything is done in git, and the history will be available, so anyone who wants to see all my gory mistakes will have no trouble doing that). But in a nutshell, this is why this is a preannouncement. I want to get the implementation to the stage where it has some of the interesting features I’ve been talking about before making it public and releasing a first version.

Don’t worry though, it should be public quite soon. And if I’m right and this is a great language to work in – then how big of a deal is another month of waiting?

I’m very excited about this, and I hope you will be too! This is an adventure.



Emerging Languages camp – day 1


Yesterday was the first day of the Emerging Languages Camp, a part of OSCON specifically organized for language creators and designers. You can read more about it at www.emerginglangs.com. The first day was fantastic, lots of very interesting talks and great conversations. The amount of brain power in this room is really humbling.

The format of the camp is that there are about 20 speakers and each speaker gets 20 minutes. This is a fairly limiting format and means the speakers will have to focus their talks quite substantially. I expected a few talks (including my own) to bomb completely because of this, but it didn’t happen during the whole day. All of the talks were very different but good in many ways.

All of the presentations are filmed by Confreaks and will be available within a few weeks.

I’ll try to write a few sentences about each presentation, with thoughts and impressions baked in.

Go

Rob Pike started out the day by talking about the history of CSP (communicating sequential processes) and the lineage of languages that led to Go. Most of the talk was based on using channels/goroutines to handle concurrency. It was definitely a good talk, but it didn’t get me more interested in using Go for anything.

Ioke/Seph

I had the second slot. I had twenty minutes to cover both Ioke and a new language I’m working on, called Seph. Against all odds, my talk went quite well and I managed to communicate the things I wanted to get said. Hopefully the audience wasn’t too bored.

Thyrd

Thyrd is a proof of concept visual language, focused on using tablets for programming – so it’s distinctly none-textual. In many cases you drag and drop operations instead of typing them. The actual development happens in a recursive grid of cells. I’m wondering what the audience for this language would be – it definitely looks intruiging though, and I like how some algorithms ended up being very easily readable and understandable.

Parrot

Allison Randall gave a talk about what’s currently happening with Parrot. It seems they are going for a new rewrite of most of the subsystems. One of the changes is going from a CISC style op code system to a RISC style. Parrot apparently has over 1200 op codes at this point, and they want to scale back everything to about 20-30 bytecodes instead. As a preparation for this, they have ripped out the JIT and will revisit most of the subsystems in Parrot to see what can be done. Allison also gave the audience the distinct impression that Parrot is still quite slow for user programs.

Ur

Of all the talks during the day, I think I understood the least of the Ur/Web talk. Ur is a functional limited programming language focused specifically on building web applications. It’s got dependent types inspired by Agda and allow you to statically check your whole program. The example shown was a simple CRUD app, and I didn’t get any impression of how complicated it would be to actually use it for a real world application. The speaker said the only real world web app he knows about is a hosting application for Ur applications that he is building himself.

Frink

I don’t think I can do this presentation justice. Frink is just incredibly cool and you should check it out. It’s a general purpose programming language, but it’s got units of measure and several other features builtin that makes it very easy to use it to calculate all kinds of interesting facts. As an example, he showed that if all people in China jumped at the same time, that would be equivalent to 4.7 on the Richter scale.

Newspeak

Gilad Bracha talked a bit about the basic ideas and principles behind Newspeak and what the current status is. Gilad focused on no global state, and all names being late bound (including class names). The first feature falls quite naturally out of prototype based OO, so it’s something both Io, Ioke and Seph has (and it’s really nice). The second feature is a bit more obscure, but I’m not sure if it gives as many benefits as the first one.

F#

Joe Pamer talked about what they had to do to take F# from a research language to something Microsoft could ship in Visual Studio 2010. Not something most of us really think about, but there are lots of challenges in doing that kind of transition. Joe covered this quite well and also gave us an insight into the current state of F#.

CoffeeScript

CoffeeScript is a language that compiles down to JavaScript. In comparison with GWT for example, it’s pretty close in semantics to JavaScript, and the generated code can be debugged and looked out without wanting to stab out your eyes. The syntax of CoffeeScript is very pleasant and looks very nice to work with (it’s indentation based, and focuses on getting lambdas to be as small as possible). Next time I’m reaching for JavaScript, I think I might just go for CoffeeScript instead. Good stuff.

Mirah

Charles Nutter covered Mirah (the language formerly known as Duby). It looks more and more complete and useful, and sooner or later I’m going to try switching most of my Java development to Mirah. The extensability features makes it possible to do metaprogramming tricks in Mirah that you wouldn’t even try in Ruby.

Io

Steve jumped in last minute to cover for the Objective-J guy who couldn’t be here. Steve covered the basics of Io, talking about concurrency and the other basic features.

It’s been a great first day, and now day two begins – so I’ll have to focus on that.



Some results from the Ioke experiment


It’s been a bit over 18 months since I first released Ioke in the wild. During this time I’ve always been specific about Ioke first and foremost being a language experiment. I changed many things to see what would work and what would not. I thought I’d take stock and take a look at a few of these decisions and how I feel about them now. Ioke never got a huge user base, of course, so most of these impressions are based on my continued working on the language, and also experiences trying to explain features of the language.

This does not mean in any way that the Ioke experiment is over. I will continue working on Ioke and see what else interesting will come out of it.

White space separation for method calls

I adapted Io’s, Self’s and Smalltalk’s syntax for Ioke. This meant I could use periods to end expressions instead (taking the role of semicolons in most other languages). Personally I like this a lot. Readability really improves substantially by using white space for method calls. The only thing that makes it a bit tricky is the interaction regular expression syntax. I ended up adding an initial character to regular expressions to make them easily distinguishable. I thought I would dislike that more than I do. So white space is definitely a win.

Keyword syntax in method calls

Having methods take keyword arguments and positional arguments baked in to the language was also a big win. It’s really a huge difference between this approach and something like Ruby – having it first class means it is easy to do things like collecting all keyword arguments, provide default values and so on. It also makes introspection and documentation much better. Finally, the duality if dictionary creation with keywords and regular method invocation ended up being very pleasing. Another clear win. Languages should have keyword arguments.

Nontraditional naming

I wanted to see what would happen if I stayed away from the traditional names in the object oriented languages of today. So I didn’t use Object, String, prototype, slot, clone or property. The most obvious place for this is in the core concepts of the language. The place where user code starts is called Origin. I’m don’t miss Object as such, but I’m not sure Origin is the clearest way of talking about this object. My current thoughts are going in the direction of something like “Vanilla” (from Flavors), or “Something”.  Another problematic renaming was to talk about the act of creating a new object as “mimicking”, and call the parents of an object “mimics”. It ended up being very confusing, both from a verb/noun standpoint, but also just from simply being to opaque. So that’s a definite failure. I’m still comfortable with “cell” instead of “slot” or “property”. I’m also happy with “Ground”, “Base” and “DefaultBehavior”. All of these communicate clearly what they should. I’m also happy about the renaming of “String” to “Text”. I don’t use the type name much in Ioke code, but when I do “Text” feels much better.

Numerical tower, and no real numbers

I’ve always liked numerical towers in programming languages, and it feels good to have it in Ioke. Ratios are also necessary as first class concepts. I also decided to not have real numbers, only the equivalent of BigDecimals. That was probably a good decision for Ioke, and I still feel real numbers are problematic. I don’t think removing them from the language is the right solution though.

Condition system instead of exceptions

The decision to adapt and include a condition system based on Common Lisp was definitely a success. I like the programming model and it makes code much more flexible and expressive. Clear win.

No global scope

This is also a clear win. It’s a tricky one in many languages do. You have to unify things to a high level to make it possible to get away from global state. But I think the benefits way outweigh the cost of this.

Specialized forms of code

Ioke have quite a few variations in runnable code. The main distinction is between things that are lexically scoped and things that are object scoped. Methods, Macros and Syntax are object scoped, while Blocks and Lecros are lexically scoped. This seemed like a good idea at the time, but if I were to do it again, I would try to unify several of those – at the cost of making the evaluation rules slightly more complicated. Especially having methods that aren’t lexical closures still surprises me regularily. I wonder if I was influenced by the way Ruby works when designing these parts.

Prototype OO

I’ve always maintained that properly implemented propotype based object orientation is both conceptually simpler and more powerful than class based object orientation. I still believe this to be true, and I still think prototype based is better than class based. However, there are some places where the model breaks down. There are some situations where it just makes sense to have a class that describes objects. Take numbers for example. In a prototype based scheme, what is the parent of the number 2? Is the parent the number 1? Not really. In Ioke I added a singleton object Number that is the parent of all numbers. But that still becomes weird since you could write code like “Number + 9″, and expect that to work. I’m not sure how to solve this problem. Of course, prototypes can represent classes without problem, but my problem is mostly what makes sense intuitively.

Ruby-like load/require system

For a language with a global scope and/or total mutability, it works quite well to just have things represented as scripts that will modify a shared environment when loaded. However, there are things that become cumbersome – parameterization of modules/files become very ad hoc, and there is a real risk of conflicting names. If I were to redo this part I would probably opt for something slightly less convenient but more powerful, that allow you to work with software modules in a better way. Exporting parts and keeping other parts private, parameterize modules, bind modules under different names, and so on.



Ioke at Chicago and Iowa Code Camp


This saturday, May 1st, I will be talking at both the Chicago Code Camp and the Iowa Code Camp. I will be giving an introduction to Ioke at both code camps, and I will actually have slightly more time than I usually do – so hopefully this introduction will be the best intro to Ioke ever. I’ve also hacked on some fun stuff for Ioke lately that hopefully will be done by Saturday, so I might show some of that.

Hope to see many of you there!



Destructuring extravaganza


A few months back I added support for destructuring assignment and tuples to Ioke. Since Ioke’s assignment is just a regular method call, this was actually fairly easy to do. The end result is that you can do things like (x, y) = (13, 14). You can also do more interesting things, such as ((x, y), (x2, y2)) = [[1,2],[3,4]]. Notice that the right hand side is not a tuple anymore, but a list. Anything that can be turned into a tuple using the asTuple method can be on the right hand side, or an item in a recursive destructuring.

All this functionality makes code slightly more readable. But last week I decided to add support for eachCons and eachSlice, and suddenly I realized that destructuring would be very nice to have not only in the explicit assignment case, but also in cases where you want to pick apart the arguments to an enumerable or sequence method. So I added those, which means that suddenly lots of code becomes much more simple.

Short story, in all Sequence and Enumerable methods, at every place where you could put an argument name, you can now put a destructuring statement instead. Let’s take a look at an example:

Point = Origin with(asTuple: method((x, y, z)))

points = [
  Point with(x: 42, y: 14, z: -1),
  Point with(x: 20, y: 0, z: 444),
  Point with(x: 31, y: 646, z: 3),
  Point with(x: 456, y: 14, z: 12)
  ]

distances1 = points consed map(obj,
  ((obj[0] x) * (obj[1] x) +
    (obj[0] y) * (obj[1] y) +
    (obj[0] z) * (obj[1] z)) sqrt)

distances2 = points consed map(
  ((x1,y1,z1), (x2,y2,z2)),
  (x1*x2 + y1*y2 + z1*z2) sqrt)

distances1 inspect println
distances2 inspect println

This code first creates a Point that can be coerced into a tuple of x, y and z coordinates. We then create a list of Points with different coordinates. We then want to calculate the three distances between the four points. We do this in two ways, using the old method and then using destructuring. The method consed is a sequence version of eachCons. The default cons length is 2, so this will yield three entries with two points in each. We then call map on the sequence. We will get a List of two entries, where each entry is a point. Finally we use Pythagoras to calculate the distance.

The second version is very similar – the only difference is that instead of using the square brackets to index into the lists, we instead give a pattern. This pattern contains two patterns, and the variable names inside of it will be bound to the right parts of each point.

At least in my mind, the destructured syntax is much more readable than the original one. And remember, this works for anything that can be turned into a tuple, which means you can use it on any Enumerable – you can use it on a Pair (such as what a Dict will yield) or any thing you would want to add asTuple to on your own.



Ioke P released


I am very happy to announce that Ioke P has finally been released!

Ioke is a language that is designed to be as expressive as possible. It is a dynamic language targeted at the Java Virtual Machine. There also exists a version for the CLR. It’s been designed from scratch to be a highly flexible general purpose language. It is a prototype-based programming language that is inspired by Io, Smalltalk, Lisp and Ruby.

Homepage: http://ioke.org
Download: http://ioke.org/download.html
Programming guide: http://ioke.org/wiki/index.php/Guide
Wiki: http://ioke.org/wiki

The two specific releases that encompass Ioke P are ikj 0.4.0 and ikc 0.4.0.

Ioke P is the fourth release of Ioke. It includes many new features compared to Ioke E:

  • Number Infinity
  • eval
  • Reflector
  • Hooks
  • First class Runtime
  • New parser
  • Tuples
  • Structs
  • Destructuring assignment
  • Message rewriting
  • Functional composition
  • Sequences
  • Dictionary and Set versions of Enumerable methods
  • Enumerable group, Enumerable groupBy
  • Set operations for union, intersection, membership, subset and superset testing
  • ISpec stubbing and mocking
  • IIk history
  • DokGen on separate projects

Ioke P also includes a large amount of bug fixes.

Features:

  • Expressiveness first
  • Strong, dynamic typing
  • Prototype based object orientation
  • Homoiconic language
  • Simple syntax
  • Powerful macro facilities
  • Condition system
  • Aspects
  • Java integration
  • Developed using TDD
  • Documentation system that combines documentation with specs
  • Runs on both the JVM and the CLR

The many things added in Ioke P could not have been done without the support of all the Ioke contributors. Thank you!

Regards
Ola Bini    – ola.bini@gmail.com



Should languages be multi-lingual?


I’m currently sitting in the Beijing ThoughtWorks office, and for some reason language is on my mind… =)

One of the discussions related to DDD that have turned up several times the last few months at conferences
is how you handle ubiquitous language when your domain is not in English. Since most programming languages are based on English, you end up mixing English and Swedish for example, if you are working with a Swedish domain. Of course, the benefits of working with these concepts in Swedish are very hard to argue against. But the dichotomy between the programming language and the domain language is definitely something that hurts my eyes, so I’m generally not very fond of that approach.

In fact, I haven’t heard anyone come up with a good solution to this problem, and this post is not really a solution either.

One of the things I’ve proposed to make this situation better is to create an external DSL that is fully in the domain language. The implementation of that DSL can then be implemented in English. The main benefit is that there is a clear separation.between the domain language and the programming language. On the other hand, the overhead of creating the DSL and also the complexities involved in translating the domain concepts into programming language concepts can become problematic too.

One interesting idea in Cucumber is the idea that you can easily add new natural languages to write the features in. When it comes to user stories at the level of testing that Cucumber provides, it’s really important to use the right language. So it got me thinking, could you use the same kind of approach in a general programming language too?

As an experiment I took a small example program for Ioke, and translated it into Mandarin, with simplified Chinese characters. Of course I used Google Translate for this, so the translation is probably not very good, but the end result is still interesting. I’m not going to try to get this into my blog, so take a look at the file at github instead: http://github.com/olabini/ioke/blob/master/examples/chinese/account.ik. As you can see there is nothing in there that even reeks of English. If you don’t understand Chinese characters it is probably hard to see what’s happening here. Basically an Account object is created, with a “transfer” method and a “print” method. Further down, two instances of this Account object is created, some transfers are made, and then the objects are printed. But provided my translation is not too crappy, this code should make sense to someone reading Chinese.

Now, this is actually extremely simple to implement in Ioke, since it relies on several of the features Ioke handles very easily. That everything is a message really helps, and having everything be first class means I can alias methods and things like that without any worry. Obviously your language also need to handle non-ascii identifiers correctly, but that should be standard in this day and age.

When thinking about it, something similar to do this can be created in languages like Lisp, Smalltalk, Factor, Io and Haskell – but most other languages would struggle. If you have keywords in your language, it’s really a killer – you would need to branch your parser to make it happen.

Of course, this approach only works when you can simply translate from one word to another. If the writing system is right to left, or top to bottom, it’s much more tricky to create a good translation.

I’m also not sure if this is actually a really good idea or not. It might be. The other thing I’ve been thinking about is how to handle multilingual editing. What if you want to be able to switch back and forth between languages? How can you handle identifiers with more than one name. Would you want to?

Lots of unanswered questions here. But it’s still funny to think about. Communication is the main goal, as usual.



Ioke sequence support


The last two weeks I’ve been working on adding external iterators to Ioke. This work is now done and merged, so I thought I’d just describe it a bit.

But first, why do I need explicit iterators in Ioke? Ruby has gotten by without them for a long time, only implementing a Generator library using continuations, in the standard library. It’s pretty nice, since you don’t really need to do anything explicit to get external iterators from internal ones. Of course, the problem is that it’s very inefficient to implement them like this. So I decided that Ioke should have an explicit protocol for external iterators. You can implement internal iterators using external ones efficiently, but not the other way around.

The two major objects for this in Ioke is called Sequence and Mixins Sequenced. Sequenced is the mixin that gives you access to several helper methods if you implement the “seq” method. If you implement “seq” and mixin Sequenced you will also get an “each” method and Enumerable. The “seq” method is expected to return something that mimics Sequence and has one “next” method, and one “next?” method. That’s all. The “next?” method returns true if there is another element in the sequence, and “next” returns the next one. The protocol is undefined if you call “next” when “next?” would have returned false.

Sequenced give you an “each” method that in addition to the regular each-protocol will also return the result of calling “seq” if you don’t give any arguments to “each”.

Except for that, you will get several methods that just call “seq” and calls the same method on the result of that. These methods are: “mapped”, “collected”, “filtered”, “selected”, “grepped”, “zipped”, “dropped”, “droppedWhile” and “rejected”. These methods are also the same as exist on Sequence. These methods return new sequences that implement the same behavior as the methods with similar names on Enumerable.

Finally, Sequence also mimics Mixin Enumerable. Once you call one of the Enumerable-methods, the whole sequence will be realized, or as much as is necessary to give an answer. A small example of how you could use it:

(1..100000000) mapped(x, x*x) filtered(x, x % 3 == 0) takeWhile( < 10000 )

This example creates a range from 1 to 100,000,000 and finds all the squares that are less than 10,000 an d that is evenly dividable by 3.



A new parser for Ioke


Last week I finally bit the bullet and rewrote the Ioke parser. I’m pretty happy about the end result actually, but it does involve moving away from Antlr’s as a parser generator. In fact, the new parser is handwritten – and as such goes against my general opinion to generate everything possible. I would like to quickly take a look at the reasons for doing this and also what the new parser will give Ioke.

For reference, the way the parser used to work was that the Antlr generated lexer and parser gave the Ioke runtime an Antlr Tree structure. This tree structure was then walked and transformed into chained Message’s, which is the AST that Ioke uses internally. Several other things were also done at this stage, including separating message chains on comma-borders. Most significantly the processing to put together interpolated strings and regular expressions happened at this stage. Sadly, the code to handle all that was complex, ugly, slow and frail. After this stage, operator shuffling happened. That part is still the same.

There were several problems I wanted to solve, but the main one was the ugliness of the algorithm. It wasn’t clear from the parser how an interpolated expression mapped into the AST, and the generated code added several complications that frankly weren’t necessary.

Ioke is a language with an extremely simple base syntax. It is only slightly more complicated than the typical Lisp parser, and there is almost no parser-level productions needed. So the new parser does away with the lexer/parser distinction and does everything in one pass. There is no need for lookahead at the token level, so this turns out to be a clear win. The code is actually much simpler now, and the Message AST is created inline in the new parser. When it comes to interpolation, instead of the semantic predicates and global stacks I had to use in the Antlr parser, I just do the obvious recursive interpolation. The code is simple to understand and quite efficient too.

At the end of the day, I did expect to see some performance improvements too. They turned out to be substantial. Parsing is about 2.5 times faster, and startup speed has improved by about 30%. The distribution size will be substantially smaller since I don’t need to ship the Antlr runtime libraries. And building the project is also much faster.

But the real gain is actually in maintainability of the code. It will be much easier for me to extend the parser now. I can do nice things to make the syntax more open ended and more powerful in ways that would be very inconvenient in Antlr. The error messages are much better since I have control over all the error states. In fact, there are only 13 distinct error messages in the new parser, and they are all very clear on what has gone wrong – I never did the work in the old parser to support that, but I get that almost for free in the new one.

Another thing I’ve been considering is to add reader macros to Ioke – and that would also have been quite painful with the Antlr parser generator. So all in all I’m very happy about the new parser, and I think it will definitely make it easier for the project going forward.

This blog post is in no way saying that Antlr is bad in any way. I like Antlr a lot – it’s a great tool. But it just wasn’t the right tool for Ioke’s syntax.



Continuous Integration for Ioke with Cruise


I’ve felt the need for this since I put out the CLR version of Ioke, and now I’ve finally managed to make it happen. Even though I’m the only person with commit rights to Ioke so far, it is still good to have continuous integration running, especially since there are at least seven different builds I want to test, 3 on linux and 4 on windows.

I now have two servers running this. They are not public right now – I will post something when the dashboard is up – but the CI server will send notification emails to the ioke-language Google Group with status.

The current setup tests Java 1.5, Java 1.6 on Linux and Windows. It tests Mono on Linux and Windows, and .NET on Windows.

As a CI server I’m using Cruise, ThoughtWorks own Continuous Integration server. Cruise is a commercial product, but open source projects can use it for free. I’m very happy about it from earlier projects, which is why I decided to use it for Ioke.

ThoughtWorks also gave me two virtual machines to run this CI server – which I’m very grateful for.