Destructuring extravaganza


A few months back I added support for destructuring assignment and tuples to Ioke. Since Ioke’s assignment is just a regular method call, this was actually fairly easy to do. The end result is that you can do things like (x, y) = (13, 14). You can also do more interesting things, such as ((x, y), (x2, y2)) = [[1,2],[3,4]]. Notice that the right hand side is not a tuple anymore, but a list. Anything that can be turned into a tuple using the asTuple method can be on the right hand side, or an item in a recursive destructuring.

All this functionality makes code slightly more readable. But last week I decided to add support for eachCons and eachSlice, and suddenly I realized that destructuring would be very nice to have not only in the explicit assignment case, but also in cases where you want to pick apart the arguments to an enumerable or sequence method. So I added those, which means that suddenly lots of code becomes much more simple.

Short story, in all Sequence and Enumerable methods, at every place where you could put an argument name, you can now put a destructuring statement instead. Let’s take a look at an example:

Point = Origin with(asTuple: method((x, y, z)))

points = [
  Point with(x: 42, y: 14, z: -44),
  Point with(x: 20, y: 0, z: 444),
  Point with(x: 31, y: 646, z: 3),
  Point with(x: 456, y: 14, z: 12)
  ]

distances1 = points consed map(obj,
  ((obj[0] x) * (obj[1] x) +
    (obj[0] y) * (obj[1] y) +
    (obj[0] z) * (obj[1] z)) sqrt)

distances2 = points consed map(
  ((x1,y1,z1), (x2,y2,z2)),
  (x1*x2 + y1*y2 + z1*z2) sqrt)

distances1 inspect println
distances2 inspect println

This code first creates a Point that can be coerced into a tuple of x, y and z coordinates. We then create a list of Points with different coordinates. We then want to calculate the three distances between the four points. We do this in two ways, using the old method and then using destructuring. The method consed is a sequence version of eachCons. The default cons length is 2, so this will yield three entries with two points in each. We then call map on the sequence. We will get a List of two entries, where each entry is a point. Finally we use Pythagoras to calculate the distance.

The second version is very similar - the only difference is that instead of using the square brackets to index into the lists, we instead give a pattern. This pattern contains two patterns, and the variable names inside of it will be bound to the right parts of each point.

At least in my mind, the destructured syntax is much more readable than the original one. And remember, this works for anything that can be turned into a tuple, which means you can use it on any Enumerable - you can use it on a Pair (such as what a Dict will yield) or any thing you would want to add asTuple to on your own.



Ioke P released


I am very happy to announce that Ioke P has finally been released!

Ioke is a language that is designed to be as expressive as possible. It is a dynamic language targeted at the Java Virtual Machine. There also exists a version for the CLR. It’s been designed from scratch to be a highly flexible general purpose language. It is a prototype-based programming language that is inspired by Io, Smalltalk, Lisp and Ruby.

Homepage: http://ioke.org
Download: http://ioke.org/download.html
Programming guide: http://ioke.org/wiki/index.php/Guide
Wiki: http://ioke.org/wiki

The two specific releases that encompass Ioke P are ikj 0.4.0 and ikc 0.4.0.

Ioke P is the fourth release of Ioke. It includes many new features compared to Ioke E:

  • Number Infinity
  • eval
  • Reflector
  • Hooks
  • First class Runtime
  • New parser
  • Tuples
  • Structs
  • Destructuring assignment
  • Message rewriting
  • Functional composition
  • Sequences
  • Dictionary and Set versions of Enumerable methods
  • Enumerable group, Enumerable groupBy
  • Set operations for union, intersection, membership, subset and superset testing
  • ISpec stubbing and mocking
  • IIk history
  • DokGen on separate projects

Ioke P also includes a large amount of bug fixes.

Features:

  • Expressiveness first
  • Strong, dynamic typing
  • Prototype based object orientation
  • Homoiconic language
  • Simple syntax
  • Powerful macro facilities
  • Condition system
  • Aspects
  • Java integration
  • Developed using TDD
  • Documentation system that combines documentation with specs
  • Runs on both the JVM and the CLR

The many things added in Ioke P could not have been done without the support of all the Ioke contributors. Thank you!

Regards
Ola Bini    - ola.bini@gmail.com



Should languages be multi-lingual?


I’m currently sitting in the Beijing ThoughtWorks office, and for some reason language is on my mind… =)

One of the discussions related to DDD that have turned up several times the last few months at conferences
is how you handle ubiquitous language when your domain is not in English. Since most programming languages are based on English, you end up mixing English and Swedish for example, if you are working with a Swedish domain. Of course, the benefits of working with these concepts in Swedish are very hard to argue against. But the dichotomy between the programming language and the domain language is definitely something that hurts my eyes, so I’m generally not very fond of that approach.

In fact, I haven’t heard anyone come up with a good solution to this problem, and this post is not really a solution either.

One of the things I’ve proposed to make this situation better is to create an external DSL that is fully in the domain language. The implementation of that DSL can then be implemented in English. The main benefit is that there is a clear separation.between the domain language and the programming language. On the other hand, the overhead of creating the DSL and also the complexities involved in translating the domain concepts into programming language concepts can become problematic too.

One interesting idea in Cucumber is the idea that you can easily add new natural languages to write the features in. When it comes to user stories at the level of testing that Cucumber provides, it’s really important to use the right language. So it got me thinking, could you use the same kind of approach in a general programming language too?

As an experiment I took a small example program for Ioke, and translated it into Mandarin, with simplified Chinese characters. Of course I used Google Translate for this, so the translation is probably not very good, but the end result is still interesting. I’m not going to try to get this into my blog, so take a look at the file at github instead: http://github.com/olabini/ioke/blob/master/examples/chinese/account.ik. As you can see there is nothing in there that even reeks of English. If you don’t understand Chinese characters it is probably hard to see what’s happening here. Basically an Account object is created, with a “transfer” method and a “print” method. Further down, two instances of this Account object is created, some transfers are made, and then the objects are printed. But provided my translation is not too crappy, this code should make sense to someone reading Chinese.

Now, this is actually extremely simple to implement in Ioke, since it relies on several of the features Ioke handles very easily. That everything is a message really helps, and having everything be first class means I can alias methods and things like that without any worry. Obviously your language also need to handle non-ascii identifiers correctly, but that should be standard in this day and age.

When thinking about it, something similar to do this can be created in languages like Lisp, Smalltalk, Factor, Io and Haskell - but most other languages would struggle. If you have keywords in your language, it’s really a killer - you would need to branch your parser to make it happen.

Of course, this approach only works when you can simply translate from one word to another. If the writing system is right to left, or top to bottom, it’s much more tricky to create a good translation.

I’m also not sure if this is actually a really good idea or not. It might be. The other thing I’ve been thinking about is how to handle multilingual editing. What if you want to be able to switch back and forth between languages? How can you handle identifiers with more than one name. Would you want to?

Lots of unanswered questions here. But it’s still funny to think about. Communication is the main goal, as usual.



Ioke sequence support


The last two weeks I’ve been working on adding external iterators to Ioke. This work is now done and merged, so I thought I’d just describe it a bit.

But first, why do I need explicit iterators in Ioke? Ruby has gotten by without them for a long time, only implementing a Generator library using continuations, in the standard library. It’s pretty nice, since you don’t really need to do anything explicit to get external iterators from internal ones. Of course, the problem is that it’s very inefficient to implement them like this. So I decided that Ioke should have an explicit protocol for external iterators. You can implement internal iterators using external ones efficiently, but not the other way around.

The two major objects for this in Ioke is called Sequence and Mixins Sequenced. Sequenced is the mixin that gives you access to several helper methods if you implement the “seq” method. If you implement “seq” and mixin Sequenced you will also get an “each” method and Enumerable. The “seq” method is expected to return something that mimics Sequence and has one “next” method, and one “next?” method. That’s all. The “next?” method returns true if there is another element in the sequence, and “next” returns the next one. The protocol is undefined if you call “next” when “next?” would have returned false.

Sequenced give you an “each” method that in addition to the regular each-protocol will also return the result of calling “seq” if you don’t give any arguments to “each”.

Except for that, you will get several methods that just call “seq” and calls the same method on the result of that. These methods are: “mapped”, “collected”, “filtered”, “selected”, “grepped”, “zipped”, “dropped”, “droppedWhile” and “rejected”. These methods are also the same as exist on Sequence. These methods return new sequences that implement the same behavior as the methods with similar names on Enumerable.

Finally, Sequence also mimics Mixin Enumerable. Once you call one of the Enumerable-methods, the whole sequence will be realized, or as much as is necessary to give an answer. A small example of how you could use it:

(1..100000000) mapped(x, x*x) filtered(x, x % 3 == 0) takeWhile( < 10000 )

This example creates a range from 1 to 100,000,000 and finds all the squares that are less than 10,000 an d that is evenly dividable by 3.



A new parser for Ioke


Last week I finally bit the bullet and rewrote the Ioke parser. I’m pretty happy about the end result actually, but it does involve moving away from Antlr’s as a parser generator. In fact, the new parser is handwritten - and as such goes against my general opinion to generate everything possible. I would like to quickly take a look at the reasons for doing this and also what the new parser will give Ioke.

For reference, the way the parser used to work was that the Antlr generated lexer and parser gave the Ioke runtime an Antlr Tree structure. This tree structure was then walked and transformed into chained Message’s, which is the AST that Ioke uses internally. Several other things were also done at this stage, including separating message chains on comma-borders. Most significantly the processing to put together interpolated strings and regular expressions happened at this stage. Sadly, the code to handle all that was complex, ugly, slow and frail. After this stage, operator shuffling happened. That part is still the same.

There were several problems I wanted to solve, but the main one was the ugliness of the algorithm. It wasn’t clear from the parser how an interpolated expression mapped into the AST, and the generated code added several complications that frankly weren’t necessary.

Ioke is a language with an extremely simple base syntax. It is only slightly more complicated than the typical Lisp parser, and there is almost no parser-level productions needed. So the new parser does away with the lexer/parser distinction and does everything in one pass. There is no need for lookahead at the token level, so this turns out to be a clear win. The code is actually much simpler now, and the Message AST is created inline in the new parser. When it comes to interpolation, instead of the semantic predicates and global stacks I had to use in the Antlr parser, I just do the obvious recursive interpolation. The code is simple to understand and quite efficient too.

At the end of the day, I did expect to see some performance improvements too. They turned out to be substantial. Parsing is about 2.5 times faster, and startup speed has improved by about 30%. The distribution size will be substantially smaller since I don’t need to ship the Antlr runtime libraries. And building the project is also much faster.

But the real gain is actually in maintainability of the code. It will be much easier for me to extend the parser now. I can do nice things to make the syntax more open ended and more powerful in ways that would be very inconvenient in Antlr. The error messages are much better since I have control over all the error states. In fact, there are only 13 distinct error messages in the new parser, and they are all very clear on what has gone wrong - I never did the work in the old parser to support that, but I get that almost for free in the new one.

Another thing I’ve been considering is to add reader macros to Ioke - and that would also have been quite painful with the Antlr parser generator. So all in all I’m very happy about the new parser, and I think it will definitely make it easier for the project going forward.

This blog post is in no way saying that Antlr is bad in any way. I like Antlr a lot - it’s a great tool. But it just wasn’t the right tool for Ioke’s syntax.



Continuous Integration for Ioke with Cruise


I’ve felt the need for this since I put out the CLR version of Ioke, and now I’ve finally managed to make it happen. Even though I’m the only person with commit rights to Ioke so far, it is still good to have continuous integration running, especially since there are at least seven different builds I want to test, 3 on linux and 4 on windows.

I now have two servers running this. They are not public right now - I will post something when the dashboard is up - but the CI server will send notification emails to the ioke-language Google Group with status.

The current setup tests Java 1.5, Java 1.6 on Linux and Windows. It tests Mono on Linux and Windows, and .NET on Windows.

As a CI server I’m using Cruise, ThoughtWorks own Continuous Integration server. Cruise is a commercial product, but open source projects can use it for free. I’m very happy about it from earlier projects, which is why I decided to use it for Ioke.

ThoughtWorks also gave me two virtual machines to run this CI server - which I’m very grateful for.



What is eval?


The glib answer to this question would be: “evil”. Of course, that doesn’t really tell us anything new. I wanted to explore the question of where in the spectrum eval fits in, in dynamic languages, and why the power of the language is ultimately increased by including eval.

Lately I’ve been saying that having eval is actually a roundabout way of having the interpreter be first class. After some thinking I’ve realized that this isn’t strictly true, which is why I wanted to spend some more time on eval.

The history of eval goes back to McCarthy’s paper on Lisp, long before Lisp was actually implemented. The interesting point is that the eval given in that paper can be used by the language itself, and the language can define its own semantics in term of itself, so a complete eval can be implemented in the language itself. This property is generally called a metacircular interpreter. Of course, having eval be this easy to implement in the language itself makes it extremely simple to also tweak it a bit and implement subtly different versions of the language. All of these advantages are not really based on eval itself, though, but rather in the fact that Lisp is so easy to define in terms of itself.

Eval shines more in languages where it’s really hard to define the semantics, like in JavaScript, Ruby or Perl. In these languages it is still possible to implement an eval in the language itself, but it’s extremely hard. In these languages, having eval gives you an escape hatch into the already implemented interpreter that is running the host code.

There are two different versions of eval in common use. Which one is mostly used depends on the type of the language. In homoiconic languages you will generally not give strings to eval, since you can just give the code to execute directly to eval. The typical example of this is Lisp, where eval takes an S-expression. Since it is so easy to build S-expressions (and it’s fundamentally more expressive), this means that this version of eval makes many things easy. Languages that are not homoiconic generally takes a string that contains the code, and will then parse the code and then execute it.

Most versions of eval also take an argument that contains the current binding information, or the current context. In some versions this is implicit and can never be sent in explicitly, while some languages (like Ruby and Lisp) allow you to send in the binding separately. For this to be powerful you obviously need a way to get at the binding in a current context, and then be able to store that somewhere.

So, in summary eval depends on two different capabilities that are more or less orthogonal. The first one is to call out to the interpreter and ask it to execute some code. The second is to be able to manipulate code contexts in a limited manner. Some languages allow you to do whatever you want with contexts, but that is definitely not the norm - since it disallows some very powerful optimization techniques. It is possible to get access to this information without sacrificing performance, though, as Smalltalk shows.

To get back to the question whether eval has anything to do with first class objects, we need to first look at what it actually means to be first class. Of course, the points for being first class depend to a degree on what language we are talking about. The wikipedia definition is that a first class object is something that can be used inside the programming language without restriction, compared to other entities in the language. In the context of an object oriented language, this would mean that you should be able to create new instances of it, you should be able to store it in variables, you should be able to pass it as arguments to methods and return it from methods. You should also be able to call methods on it, and so on.

Depending on how you see it, the eval function is generally pretty restricted in what you can do with it. Specifically, in Ruby, if you do the refactoring Extract Method on a piece of code that includes eval, eval will actually not work the same. This makes eval a fundamentally different method than all other methods in Ruby.

So lets change the question a bit - how can we make the interpreter first class while still retaining the simplicity of eval? The first step is to actually make the interpreter into a class. This class have one instance that is the currently running runtime. Once you have that object available at runtime, the next step is to be able to create new instances of the interpreter, and finally to be able to ask it to invoke code. The second piece of the puzzle is to make bindings/context first class, so you can create new ones at runtime and manipulate them. Once you have those two things together, eval will actually just be a shortcut to getting the current interpreter and the current runtime and ask it to evaluate some code.

Ioke doesn’t have it right now, but I have made place for it. There is an object called Runtime that reflects the current runtime. The plan is to make it possible to call mimic on in, and by doing so create a new interpreter from the current one. What is interesting is that this makes it possible to have some inherent security too. Since the second runtime mimics the first one, the second one won’t have capabilities that the first one lacks.

In Ioke a binding is just a regular Ioke object - nothing special at all really, and you can just create any kind of object and use that as a binding object. The core of simplicity in Ioke makes these operations that much simpler.

Eval is a strange beast, but at the end of the day it is still about accessing the interpreter. Generalizing this makes much more interesting things possible.



Videos from the Chicago ACM Ioke talk


This Wednesday I gave a talk about Ioke at the Chicago ACM. This was actually great fun and I’m fairly happy with the presentation. This is without doubt the best quality Ioke presentation available so far.

You can see it here: http://blip.tv/file/2229441

And here: http://blip.tv/file/2229292



Google I/O


Currently sitting in a session on day two of the Google I/O conference. The morning opened up with the keynote and announcement of Google Wave, which is something that seems very cool and has a lot of potential. Very cool start of the day.

After that I watched Ben and Dion talk about Bespin. I hadn’t seen Bespin before - it was definitely interesting, although I will be hard pressed to give up Emacs any day soon.

During lunch I came up with a fun idea, but it required something extra. I talked to Jon Tirsen, a Swedish friend from his ThoughtWorks days, who is on the Google Wave team - and he managed to get me an early access account for Google Wave. So I spent the next few hours hacking - and was able to unveil an Ioke Wave Robot during my talk. It is basically only a hello world thing, but it is almost certainly the first third-party Google Wave code… You can find it at http://github.com/olabini/iokebot. It is deployed as iokebot@appspot.com so when you have your Wave account you can add it to any waves. Very cool. I do believe there is a real potential for scripting languages to handle these tasks. Since most of it is about gluing services together, dynamic languages should be perfectly suited for it.

Finally I did my talk about JRuby and Ioke - that went quite well too. The video should be up on Google sooner or later.

And that was basically my Google I/O experience. Very nice conference and lots of interesting people.



Communication over Implementation


Last week I wrote a post about some of the statements that percolate in my mind when designing Ioke. What I didn’t mention was that these ideas are things I use to judge other programming languages too. I would say that this philosophy pretty much captures my views on programming languages. (The post in question is here: The Ioke Philosophy). So, this post is of course not complete, and I don’t think I would ever be able to write something that is totally complete.

One of the things missing - and I did allude to it in the post - was a statement that has grown on me a bit. I did a presentation about Ioke last week, and at that point I decided I needed to talk about this some more. The statement in question is what I call Communication over Implementation. This turns out to be pretty important for programming languages in general, at least in my experience.

One of the things I’m fond of saying when talking about programming languages, is that programming languages - just like natural languages - are about communication. And we don’t necessarily always think clearly about who we are communicating with. The immediate and intuitive reaction to programming languages is that they are supposed to communicate with the compiler/interpreter/cpu. That is of course true, but it is also incidental in many cases. There are many ways in which you can communicate with the machine to get it to achieve something. So the question becomes what other parties should you consider when communicating.

The next most obvious party would be yourself. If you ever need to read your code, you need to write code in such a way that you can read it later on. This constrains the way you write code quite severely. There are reasons we don’t write much code in assembly language or JVM bytecodes anymore. Yes, at some level these descriptions are extremely nice communication towards the executing machine, but they are so bad at communicating with human stakeholders that the balance generally ends up in favor of more readable languages.

When communicating with human stakeholders the thing I focus most on is intent. If your code/text communicates the intent of what you are trying to do in a good way, this makes it easier to read. There are many movements that focus on how to do this well, where domain domain design and clean code are the two that immediately comes to mind for me.

So coming back to the title. For me, implementation is a special sort of communication - that kind of communication that is supposed to be functional and describe what should actually be done in deterministic instructions to a machine. As long as this communication is functional enough - meaning that the machine does more or less the right thing - there is much leeway in how the code can be written to make other kinds of communication easier. And that is the core of this argument. A language should make it easy to communicate with other stakeholders than the machine, since those other forms of communication with code is actually much more important than only the implementation pieces. Yes, if the implementation works your program might run for a while - but if no one can read the code it can’t be maintained, it can’t be understood except in a black box way, and the utility of the system will be limited.

Go the other way. If you have a program that communicates badly with the machine (but still well enough according to the above definition), but it is written in a clearly communicating way, this means that it is easier to grow the system, it is easier to fix it or implement it more correctly. It can also easier be replaced since the program communicates what it is doing.

There are exceptions to this principle. But we seem to to favor languages that are focused on implementation and only incidentally on communication. This is the wrong choice and it need to be fixed. Communication is at the core of programming, and should also be the focus of it.