The magic it variable in if, or solving regular expressions in Ioke


I’ve spent some time trying to figure out how to handle regular expression matching in Ioke. I really like how Ruby allows you to use literal regexps and an infix operator for matching. That’s really nice and I think it reads well. The problem with it is that as soon as you want to get access to the actual match result, not just a yes or no, you have two choices - either you use the ‘match’ method, instead of ‘=~’. The other solution is to use the semi-globals, like $1, or $&, etc. I’ve never liked the globals, so I try to avoid them - and I happen to think it’s good style to avoid them.

The problem is that then you can’t do the matching as well, and the code doesn’t read as well. I’ve tried to figure out how to solve this problem in Ioke, and I think I know what to do.

The solution is to introduce a magic variable - but it’s distinctly different from the Ruby globals. For one, it’s not a global variable. It’s only available inside the lexical context if an ‘if’ or ‘unless’ method. It’s also a lexical variable, meaning it can be captured by a closure. And finally, it’s a general solution to more things than the regular expression problem. The Lisp community has known about this for a long time. In Lisp the macro is generally called aif. But I decided to just integrate it with the if and unless methods.

What does it look like? Well, for matching something and extracting two values from it, you can do this:

str = "foo bar"
if(#/(.*?) (.*?)/ =~ str,
  "first  element: #{it[1]}" println
  "second element: #{it[2]}" println)

The interpolation syntax is the same as in Ruby.

The solution is simple. An if-method, or unless-method will always create a new lexical scope including a value for the variable ‘it’, that is the result of the condition. That means that you can do a really complex operation in the condition part of an if, and then use the result inside of that. In the case of regular expressions, the =~ invocation will return a MatchData-like object if the match succeeds. If it fails, it will return nil. The MatchData object is something that can be indexed with the [] method to get the groups.

The end result is that the it variable will be available where you want it, but not otherwise. Of course, this will incur a cost on every if/unless invocation. But following my hard line of doing things without regard for optimization, and only with regard for expressability, this seems like the right way to do it.

It’s still not totally good, because it’s magic. But it’s magic which solves a specific problem and makes some things much more natural to express. I’m not a 100% comfortable with it, but I’m pretty close. Your thoughts?



Reverse Cambridge Polish notation for Lisp?


One of the things that still seem unnatural sometimes with Lisp, is the fact that in many cases the actual evaluation need to start quite for into the structure - especially if you’re programming in a heavily functional style. The problem with this is that the way you read code doesn’t go left-to-right. Instead you need to read inside out. To make this less abstract, take a look at this example Lisp code:

(defun foo (x)
  (flux
   (bar
    (conc "foo"
          (get-bar
           (something "afsdfsd")
           (if (= x "foo")
               (conc "foo" "bar")
               (conc "bar" "foo")))))))

As you can see, this is totally bogus code. The nesting makes it possible to identify the parts that will be evaluated first. Compare that to the order it will actually be evaluated. You can see this order by transforming it to Reverse Cambridge Polish notation. This is equivalent - although I know no Lisp that actually does it. As you can see the way you read it, is actually the order it will be evaluated:

(foo (x)
  ((("foo"
     (("afsdfsd" something)
      ((x "foo" =)
       ("foo" "bar" conc)
       ("bar" "foo" conc)
       if)
      get-bar)
     conc)
    bar)
   flux)
  defun)

Well. It’s different, I grant you that. And of course, you need to keep the stack in your head, while reading it. But it’s an interesting experiment.



Ioke /.FAQ


One of the more obvious points from the slashdot posting is that people have a tendency to easily misunderstand what I’m doing here. I’ll paraphrase some of the more common questions/comments from the /. thread and write a little bit about Ioke. This should maybe make some things clearer.

The name?

I didn’t know that the name would cause so many comments, but apparently it did. I personally am quite fond of the name because it can contain lots of different interpretations. The main derivations that go into the name was part including Io in the name, since Io is the main influence on the language. And partly, I liked the symmetry with the Nordic (not only Norse) trickster god Loke (which is generally written Loki in English). Some commenters thought the name was pronounced like ‘joke’. That’s not true. The pronunciation I use is three syllables: ii-oo-kee. Some felt it is a stupid name. I can only disagree - it’s a name that has multiple meanings in multiple languages. I like language trickery, both at the level of programming languages, and at the level of human languages, and I enjoy that this is reflected in the name of the programming language. And the fact that it generates some discussion is actually a testament that the name was well chosen

The JVM?

This is probably the most common misunderstanding, and it’s not at all specific to Ioke - in fact, it’s one of the most common questions about JRuby too, although I think finally this is starting to abate. There are several questions involved in this. The first - why not use Java on the JVM - there isn’t any big difference anyway, right? Well, wrong. I believe that languages are fundamentally different. (Yeah, Turing equivalence, blah blah, greenspuns tenth law, blah blah, I know all those arguments.) My point is, programming languages matter; it’s obvious that they do.

So the second question about the JVM hinges on a misunderstanding what the JVM actually gives you. Or rather, the misunderstanding is that you don’t get much from running on the JVM, except that your language will be much slower than something you coded in C - which obviously is the only manly language you can use. I call this a misunderstanding, and it comes from two camps. The first one is the camp of people who never implemented a modern language from scratch. These people doesn’t know what’s actually necessary to create a language implementation, and thus doesn’t understand how the JVM can help. Many fall into this fallacy without realizing it themselves.

The second camp is language implementors who doesn’t know what a good piece of engineering the JVM actually is, how fast Hotspot can be and how good the GC really is.

Just to give you a quick summary if you are in any of these two camps: The JVM provides - among other things - 4 (soon to be 5) kick ass garbage collectors, that are generally considered among the best in the world. It provides a thread implementation that’s been tuned for 15 years, including giving access to very capable implementations of concurrency and threading primitives. It provides a collection of libraries that is unmatched in size (including JDBC adapters for up to a few hundred different databases) - much of it open source. It provides application servers that give you all the services you would ever need. It provides interoperability with the low level native features when you need it, but you generally doesn’t need it. Of course, the JVM runs on a very wide range of platforms, and in most cases nothing need to be done to port your language to a new operating system.

And it provides an optimizing just-in-time compiler that will profile your code and dynamically optimize and deoptimize the parts of your code to get the best performance.

All of this, you get for free, when choosing to build something for the Java platform. And there’s lots more. Anyway. So why am I not interested in building my own garbage collector? Or my own thread scheduler?

I said something in the InfoQ Ioke interview, that I probably should have phrased a bit differently. Specifically, this part:

…and I don’t understand why people who create languages want to write their own GC…

What I really mean is that too me, when I think about creating a language, I first want to get everything working, including the GC, as quick as possible. When designing a language, the GC is something that should just be there, doing it’s job. Writing a GC is definitely a noble endeavor, but it’s not the main point for most languages - so if you don’t have to do it, it makes it easier to focus on the language design and the core of the implementation.

This all means that the JVM is a fantastic place for Ioke, in my opinion.

Why not Lisp/Smalltalk?

This question was mostly in the form of - paraphrased here - “If you’d like a language with the features you describe, why don’t you just use Lisp or Smalltalk?”. I think the one reason for this is the idea that Lisp and/or Smalltalk are the best versions of these kinds of languages you can ever create, and anything else will be inferior. That might be true, but it’s probably not true. I don’t think that Lisp or Smalltalk is the ultimate language. They show the way, but it’s not the end. I don’t seriously believe that Ioke will be better than any Lisp or Smalltalk, but it might happen. I like it very much right now.

At the same time, Ioke is very close to falling into the Lisp black hole. It doesn’t have S-expressions, but I could definitely argue for it being a dialect of Lisp. That’s not really interesting to do at this point, though. It doesn’t look like Lisp, but it feels like it. The other arguments for why not Lisp/Smalltak I’ve lumped together under the next heading

Why another language?

This question is basically this - there are so many good languages out there already, why create a new language? Or, there are so many languages, why do you think you can create something better? Or, oh no, not another language with stupid features that I will have to maintain. Or, if you like Lisp/Smalltalk/Ruby/Io so much, why don’t you use it instead?

I think all of these reactions are quite new. People haven’t done that many language projects from scratch, and posted about it like I do. At least I don’t think so. Of course, the Kambrian explosion during the 70’s happened in an environment where sharing was natural, and things were quite public. But there seems to have been a different feeling about new languages at that point.

So, why create a new language? First: why not? I’m creating Ioke for myself. If anyone else likes it, that would great, but it’s not the goal. The goal is to see if I can create a language that I like better than all the alternatives, and while I’m doing that see if I can write and discuss all the decisions I make in the process. I would have loved to see someone else do something like this, but I haven’t seen it. Most language creation seem to happen in a closed environment, at least initially. And it’s those first steps that I find really intriguing. So no, I don’t think I can create something better. But there’s always the possibility. And I learn something in the process.

Of course, the language will not become popular. There’s virtually no chance of it, and that’s fine too. And it means you won’t have to use it, or maintain it.

And to the last question - yeah, I like these languages, but I like different parts of them, and they all have things I don’t like. All languages are tradeoffs on different scales, and I would like to see a language that does the tradeoffs that I feel make sense

Announcing it too early?

This was an interesting point. I didn’t think I announced anything - except that I’ve started this project, and want to see how it turns out. As I said earlier, no one is going to force any one to use Ioke.

Lack of beard

This is really bad, actually. Probably the most real detriment to Ioke, of all these points, is the fact that I don’t have a beard. A language designer without a beard won’t work. Of course, the obvious example of Matz is a counter point to it, so maybe there is hope for me anyway. Because growing a beard is something I will not do. =)

In conclusion: Ioke is a language I’m creating for myself. If anyone else likes it, that’s great! But I’m not really announcing anything, when I’m talking about Ioke in this blog. Rather, I’m sharing my experiences, and if anyone’s not interested it’s actually very simple: don’t read this blog.



QCon San Francisco next week


I almost forgot. I’ll be at QCon San Francisco next week. I’m landing late Sunday, and staying around ’til the next Sunday. As usual, I expect QCon to be a blast. Me and Nick Sieger will hold a tutorial on JRuby, which should be fun too.

Hope to see lots of you there!



Ioke dynamic reception


Ioke will, just as most other dynamic languages, have a way to achieve what Martin Fowler calls Dynamic Reception. For you who are not familiar with this term, it’s the same as the method_missing of Ruby, doesNotUnderstand of Smalltak, __getattr__ of Python, forward of Io, and so on.

It basically allow you to override what happens when the regular message sending fails for some reason. It’s more general in some languages than others. In particular, the Python, Smalltalk and Io versions are more powerful than Ruby’s, since they catch all message sends, not necessarily only those that would cause an invocation.

My problem isn’t really whether I should have it in Ioke. I will have it. It’s extremely useful and allows the implementation of lots of things that can be hard otherwise. The question is what to call it. I’m not totally comfortable with any of the names given to this functionality. I would like to have something that is higher level, basically.

To understand the reasoning behind my thinking, I’ll do a very quick description of the semantic model of Ioke. It’s actually quite simple.

Everything is a message passing. When you write the name of a local variable, you’re sending a message of that name to the current context. When you’re getting the value of something, you are passing a message, etc. Something like this: “foo = 13. foo println”, will first call the message “=”, with foo and 13 as parameters. The next line will send the “foo” message, and then the “println” message to the result of the first one. So in this case it’s obvious that foo is not a method. It’s just a simple value. But if I do this: “foo = method(13). foo println”, the result will be the same, except that when sending the “foo” message, it will actually activate the method inside it. The rule is that when sending a message to get a cell (all the properties/attributes/slots of Ioke are called cells), if that cell returns something that is activatable, it will be activated. There is also a way to get cells without activating them.

So there are two relevant names. Cells and message passing. My first thinking is that since the feature will only be called when a cell can’t be found, it could be called dynamicCell, to reflect that the result is dynamic instead of static. Another version is to just say unhandledMessage, because that is really what happens. A message is sent that no one handles. The third alternative is “forward”, which I like for the metaphor. When someone sends mail to a person not living at an address anymore, the people living there can do something dynamic with it, maybe forward it somewhere.

But I just don’t know which one of these are best… Any suggestions.



Ioke syntax


Or: How using white space for application changes the syntax of a language.

I have spent most of the weekend working with different syntax elements of Ioke. Several of them are actually based on one simple decision I made quite early, and I thought it would be interesting to take a look at some of the syntax elements I’ve worked on, from the angle of how they are based on that one decision.

What is this decision then? In the manner of Smalltalk, Self and Io, I decided that periods are not the way to apply methods. Instead, space makes sense for this. So if in Java you would write “foo().bar(1).quux(2,3)” this would be written as “foo bar(1) quux(2, 3)” in Ioke. Everything is an expression and sending a message to something is done with putting the message adjacent to the thing receiving the message, separated by whitespace. This turns out to have some consequences I really didn’t expect, and several parts of the syntax have actually changed a lot because of this decision. I’ll take a look at the things that changed most recently because of it.

Terminators

Most language without explicit expression nesting (like Lisp) need some way to decide when a chain of message passing should stop. Most scripting languages today try to use newlines, and then use semicolons when newlines doesn’t quite work. That’s what I started out doing with Ioke too (since Io does it). But once I started thinking about it, I realized that Smalltalk got this thing right too. Since I don’t use dots for message application, I’m free to use it for termination. You still don’t need to terminate things that are obviously terminated with newlines, but when you need a terminator, the dot reads very well. I’ve always disliked the intrusiveness of semicolons - they seem to take to much visual space for me. Dots feel like the right size, and there is also a more pleasing symmetry with commas.

Comments

Once you don’t use semicolons for termination, you can use it for other things. I am quite fond of the Lisp tradition of using semicolons for comments, so I decided to not use hashes for that anymore. One of the ways Lisp systems use semicolons for comments is that they use different numbers of them to prepend different kinds of documentation. Common Lisp standard is to use four semicolons for headlines, three semicolons for left justified comments, two semicolons for a new line of comment that should be indented with the program text, and one semicolon for comments on the same line as program text. These things work because semicolons doesn’t take up so much visual space when stacked. A hash would never work for it.

The obvious question from any person with Unix experience will be how I handle shebangs if a hash isn’t a comment anymore. The short answer is that I will provide general read macro syntax based on hash. Since the shebang always starts with “#!” that would be a perfect application for a reader macro. That also opens up the possibility for other interesting reader macros, but I’ll take that question later.

Operator precedence

This one was totally unexpected. I had planned to add regular operator precedence style and it ended up being quite painful. I should probably have guessed the problem, but I didn’t - two grammar files later and I’m now hopefully a bit wiser. The problem ended up being whitespace. Since I use whitespace to separate application, but whitespace is also interesting to judge operator precedence, what happened was that the parsers I got working actually had exponential amount of backtracking. Two lines of regular code without operators still backtracked enough to take a minute or two to parse. Ouch. So what’s the solution? Two passes of parsing. Or not exactly, but almost. I’m currently implementing something like Io’s operator shuffling, which is a general solution to rearrange operators into a canonical form based on precedence rules. What’s fun with it is that the rules can be dynamically changed. If you want Smalltalk style left to right precedence, that should be possible by just setting the precedence to 1 for all operators. You can also turn of operator shuffling completely, which means you can’t use infix operators at all.

I’m also planning a way to scope these things, so you can actually change quite a lot of the syntax without switching the parser.

At some point I’m planning to explore how it would work to use an Antlr tree parser to do the shuffling. My intuition is that it would work well, but I’ll have to find the time to do it.

Syntactic flexibility

All is not perfect, but the current scheme seems to work well. I’ve been able to get a real amount of flexibility into the syntax, with loads of operators free for anyone to use and subclass. The result will be the possibility to create internal DSLs that Ruby could only dream of. Some things gets harder too, though. Regular expression syntax for example. If you can create a statement like this: “[10,12,14] map(/2 * 2/a)”, it’s kinda obvious that there is no easy way to know whether the statement inside the mapping call is a regular expression or an expression fragment. In Ioke the decision is simple, the above is an expression fragment. I’ve decided to make it really easy to work with regular expression syntax. Interestingly, it was one of the reasons I wanted reader macros for, and it turns out that using #/ will work well. So a regular expression looks just like in a perl like language, except that you add a hash before the first slash: #/foo/ =~ “str”. It seems that hash will end up being my syntax sin bin for those cases where I want syntax without touching the parser to much.

It’s funny to see how many things in classic syntax that changes if you change how message passing works. I like Ioke more and more for each of these things I find, and it currently looks very pleasant to work with. Dots are such an improvement for one-lines.



Hacking trampolining CPS


I spent some quality time today trying to hack together a continuation passing style system in Ruby, to clarify some of my thinking. I ended up with something that is more or less a very small interpreter for S expressions, that uses a trampolining CPS interpreter. The language is not in any way complete, such things as assignment isn’t there, there is only one global scope and so on, so the continuations in this system is really not useful for anything except for hacking with it to gain understanding.

As such, I thought people might find it a bit interesting. I wish I’d seen something like this 5 or 10 years ago… Note that this code is extremely hacky and incomplete and bad and whatnot. Be warned. =)

OK, first you need to “gem install sexp”. This provides dead easy parsing of S expressions. Since that wasn’t the main purpose of this code, doing it with a Gem was easier.

The first part of the code we need is the requires, and structures to represent continuations:

require 'rubygems'
require 'sexpressions'

class Cont
  def initialize(k)
    @k = k
  end
end

class BottomCont < Cont
  def initialize(k, &block)
    super(k)
    @f = block
  end

  def resume(v)
    @f.call(v)
  end
end

class IfCont < Cont
  def initialize(k, et, ef, r)
    super(k)
    @et, @ef, @r = et, ef, r
  end

  def resume(v)
    evaluate((v ? @et : @ef), @r, @k)
  end
end

class CallCont < Cont
  def initialize(k, r)
    super(k)
    @r = r
  end

  def resume(v)
    evaluate(v, @r, @k)
  end
end

class ContCont < Cont
  def initialize(k, v, r)
    super(k)
    @r, @v = r, v
  end

  def resume(v)
    evaluate(@v, @r, v)
  end
end

class NextCont < Cont
  def initialize(k, ne, r)
    super(k)
    @ne, @r = ne, r
  end

  def resume(v)
    evaluate(@ne, @r, @k)
  end
end

BottomCont is is what we use to do something at the end of the program. We could print something, or anything else. IfCont is used to implement a conditional. It’s quite easy - once we resume we check the truth value and evaluate the next part based of the result. CallCont will invoke some existing S expressions in a variable. It just takes the value and evaluates that. ContCont is a bit trickier. It will take a value, and then when asked to resume will assume that the parameter to resume is a continuation and invoke that continuation with the value it got earlier. Finally, NextCont is used to implement basic sequencing. It basically just throws away the earlier value and uses the next instead.

The actual code for evaluate and a helper function looks like this:

def evaluate_sexp(sexp)
  cont = BottomCont.new(nil) do |val|
    return val
  end

  env = {
    :haha => proc{|x| puts "calling proc"; 43 },
    :print => proc{|x| puts "printing" },
    :save_cont => proc{|x| puts "saving cont"; env[:saved] = x; true },
    :foo => 42,
    :bar => 33,
    :flux => "(call flux)".parse_sexp.first
  }

  c = evaluate(sexp, env, cont)

  while true
    c = c.call
  end
end

def evaluate(e, r, k)
  if e.is_a?(Array)
    case e.first
    when :if
      evaluate(e[1], r, IfCont.new(k,e[2],e[3],r))
    when :call
      evaluate(e[1], r, CallCont.new(k, r))
    when :continue
      p [:calling, :continue, e[1]]
      evaluate(e[1], r, ContCont.new(k, e[2], r))
    when :prog2
      evaluate(e[1], r, NextCont.new(k, e[2], r))
    end
  else
    case e
    when :true
      proc { k.resume(true) }
    when :nil
      proc { k.resume(nil) }
    when Symbol
      proc {
        if r[e].is_a?(Proc)
          k.resume(r[e].call(k))
        else
          k.resume(r[e])
        end
      }
    else
      proc { k.resume(e) }
    end
  end
end

Here evaluate_sexp is the entry point to the code. We first create a BottomCont that will just return the value. We then create an environment that includes simple values, a function (flux) that calls itself, and some procs that do different things. Finally evaluate is called, and then we repeatedly evaluate the thunk it returns. Since we know that the bottom continuation will return, we can actually invoke this part indefinitely. That is the actual trampolining part, right there.

The evaluate function will check if it’s an array we got, and in that case it will check the first entry and switch based on that, creating IfCont, CallCont, ContCont or NextCont based on the entry. If it’s a primitive value we do something different. As you can see we first check if the value is one of a few special ones, and then if it’s a symbol we look it up in the environment. If the value from the environment is a proc we invoke it with the current continuation, which means the proc can do funky stuff with it. The common thing for all the branches is that they wrap everything they do in a thunk, and inside that thunk call resume on the continuation with the value provided.

Finally we can try it out a bit:

p evaluate_sexp("123".parse_sexp.first) # 123
p evaluate_sexp("bar".parse_sexp.first) # 33
p evaluate_sexp("nil".parse_sexp.first) # nil

p evaluate_sexp("(if quux 13 (if true (if nil 444 555)))".parse_sexp.first) # 555
p evaluate_sexp("(if quux 13 (if true (if nil 444 haha)))".parse_sexp.first)

Here you can see that simple things work as expected.

What about calling the flux function, that will invoke itself?

p evaluate_sexp("(call flux)".parse_sexp.first)

This will actually loop endlessly. In effect, when we add trampolining to a CPS, we in effect get a stack less interpreter, in such a way that we get tail call recursion for free.

Finally, what about the actual continuation stuff? Another way of creating an eternal loop is to do something like this:

p evaluate_sexp("(prog2 save_cont (prog2 print (continue saved 33333)))".parse_sexp.first)

This piece of interesting code will actually loop forever. How? Well, first the prog2 will run the proc in save_cont. This will save the current continuation, and then return true from the proc. Then the next prog2 will be entered, running the print proc. Finally, the final part will be evaluating the continue form, which will take the continuation in saved, invoke that with the value 33333. This will in effect jump back to the first prog2, return 33333 from the call to save_cont and go into the next prog2 again. Looping…

If you use an if statement instead, and return nil from the inner call to the continuation, and add some printing to the IfCont#resume, you can see that that point will only be invoked twice:

p evaluate_sexp("(if save_cont (prog2 print (continue saved nil)) 321)".parse_sexp.first)

This will generate:

[:running, :if, :statement]
printing
[:calling, :continue, :saved]
[:running, :if, :statement]
321

Here it’s obvious that the if statement runs twice, and that the second time the evaluation turns into false, which makes the final continuation return 321

I hope this little excursion into CPS land was interesting for someone. It’s a quite useful technique to know about, once you wrap your head around it.



Ioke 0 roadmap


The first release of Ioke will be called Ioke 0, and I aim to have it more or less finished in a month or so. At the longest, it might take until Christmas. So, since it’s coming soon, I thought I would just put in a list of the kind of things I’m aiming to have in it at that release. I’ll also quickly discuss some feature I will have in the language but that’s going to be on Ioke I or Ioke II.

First, the first release of the language means that the basic core is there. The message passing works and you can create new things, methods and blocks. Numbers are in, but nothing with decimal points so far. If I need it for some of the other stuff I’m implementing, I’ll add them, otherwise integers might be the only numbers in Ioke 0. I’m OK with that. The core library will be quite small at this point too. Ioke 0 will be a usable language, but it’s definitely not batteries included in any way.

These are some specific things I want to implement before releasing it:

  • List and Dict should be in, including literal syntax for creation, aref-fing and aset-ting. Having syntax for aset means that I will have in place a simple version of setting of places, instead of just names.
  • Enumerable-like implementation for List and Dict.
  • DefaultMethod and LexicalBlock should support regular, optional, keyword and rest arguments. Currently only the rest arguments are missing, and this is mostly because I don’t have Lists yet.
  • Basic support for working with message instances, to provide crude metaprogramming.
  • The full condition system. That includes modifying the implementation to provide good restarts in the core. It also might include a crude debugger. Restarts are implemented, but the conditions will take some time.
  • cellMissing should be there. Contexts should be implemented in terms of it.
  • Basic IO functionality.
  • A reader (that reads Ioke syntax and returns the generated Message tree).
  • Access to program arguments.
  • IIk - Interactive Ioke. The REPL should definitely be in, and be tightly integrated with the main-program. I’m taking the Lisp route here, not the Ruby one. IIk will be implemented in Ioke, and should drive the evolution of several of the above features.
  • Dokgen - A tool to generate documentation about existing cells in the system. Since this information is available at run time it should be exceedingly easy to create this tool. Having it will drive features too.
  • Affirm - A testing framework written in Ioke. The goal will be to rewrite the full test suite of Ioke (which is currently using JtestR) into using Affirm instead. That’s going to happen between Ioke 0 and Ioke I.
  • Documentation that covers the full language, and some usage pointers.

There are some features I’m not sure about yet. They are larger and might prove to be too large to rush out. The main one of these is the Java integration features. Right now I’m thinking about waiting with that support.

I have loads of features planned for the future. These are the ones that I’m most interested in getting in there quite soon, which means they’ll be in either I or II.

  • Java Integration
  • Full ‘become’, with the twist that become will actually not change the class of an instance, but instead change an instance into the other instance. This is something I’ve always wanted in Ruby, and ‘become’ seems to be a fitting way to do it. This will make transparent futures and things like that quite easy to implement.
  • Common Lisp like format, that can handle formatting of elements in a List in the formatting language. Not sure I’m going to use the same syntax as Common Lisp, though. Maybe I’ll just make it into an extension of the printf support?
  • Simple aspects. Namely, it should be possible to add before, after and around advice to any cell in the system. I haven’t decided if I should restrict this to only activatable cells or any cell at all.
  • Ranges.
  • Macros. I’m not sure which version I’ll end up with yet. I have two ideas that might be more or less the same, but both of them are really, really powerful.
  • Simple methods. In Ioke, a method is something that follows a very simple interface. It’s extremely easy to create something that acts like a method in some cases but does something different. Simple methods are restricted in the kind of meta programming they can do, which means they can be compiled down to quite efficient code. This is a bit further away, maybe III or IV.
  • Continuations. I would like to have them. I think I can do it without changing to much of the structure. This is not at all a certainty at the moment, but it might happen.

That’s about it for now. Once I have the core language in place I want to start working on useful libraries around it. Once 0 is out, I’m planning to start using Ioke as my main scripting language, and have that drive what libraries I need to create and so on.

Around II or III, I think it’s time to go metacircular. Not necessarily for the implementation, but to describe the semantics in it. Might be possible to do something like SLang too, and compile Ioke to Java for the needed core.

If you are interested in following the development, you can check it out at my git repository at http://github.com/olabini/ioke, or at the project pages at http://ioke.kenai.com. The Git repository is the canonical one right now, and the Kenai HG one is a clone of that. If you’re interested in discussion Ioke, there are mailing lists at the project pages. I also will have a real page for the project ready for the first release. But I promise you will notice when that release happens.



The Maintenance myth


Update: I’ve used the words “static” and “dynamic” a bit loose with regards to languages and typing in this post. If this is something that upsets you, feel free to read “static” as “Java-like” and “dynamic” as “Ruby-like” in this post. And yes, I know that this is not entirely correct, but just as mangling the language to remove all gender bias makes it highly inconvenient to write, I find it easier to write in this language when the post is aimed at people in these camps.

Being a language geek, I tend to get into lots of discussions about the differences between languages, what’s good and what’s bad. And being a Ruby guy that hangs out in Java crowds, I end up having the static-vs-dynamic conversation way too often. And it’s interesting, the number one question everyone from the static “camp” has, the one thing that worries them the most is maintenance.

The question is basically - not having types at compile time, won’t it be really hard to maintain your system when it grows to a few millions of lines of code? Don’t you need the static type hierarchy to organize your project? Don’t you need an IDE that can use the static information to give you intellisense? All of these questions, and many more, boil down to the same basic idea: that dynamic languages aren’t as maintainable as static ones.

And what’s even more curious, in these kind of discussions I find people in the dynamic camp generally agrees, that yes, maintenance can be a problem. I’ve found myself doing the same thing, because it’s such a well established fact that maintenance suffers in a dynamic system. Or wait… Is it that well established?

I’ve asked some people about this lately, and most of the answers invariably beings “but obviously it’s harder to maintain a dynamic system”. Things that are “obvious” like that really worries me.

Now, Java systems can be hard to maintain. We know that. There are lots of documentation and talk about hard to maintain systems with millions of lines of code. But I really can’t come up with anything I’ve read about people in dynamic languages talking about what a maintenance nightmare their projects are. I know several people who are responsible for quite large code bases written in Ruby and Python (very large code bases is 50K-100K lines of code in these languages). And they are not talking about how they wish they had static typing. Not at all. Of course, this is totally anecdotal, and maybe these guys are above your average developer. But in that case, shouldn’t we hear these rumblings from all those Java developers who switched to Ruby? I haven’t heard anyone say they wish they had static typing in Ruby. And not all of those who migrated could have been better than average.

So where does that leave us? With a big “I don’t know”. Thinking about this issue some more, I came up with two examples where I’ve heard about someone leaving a dynamic language because of issues like this. And I’m not sure how closely tied they are to maintenance problem, not really, but these were the only ones I came up with. Reddit and CDBaby. Reddit switched from Lisp to Python, and CDBaby switched from Ruby to PHP. Funny, they switched away from a dynamic language - but not to a static language. Instead they switched to another dynamic language, so the problem was probably not something static typing would have solved (at least not in the eyes of the teams responsible for these switches, at least).

I’m not saying I know this is true, because I have no real, hard evidence one way or another, but to me the “obvious” claim that dynamic languages are harder to maintain smells a bit fishy. I’m going to work under the hypothesis that this claim is mostly myth. And if it’s not a myth, it’s still a red herring - it takes the focus away from more important concerns with regard to the difference between static and dynamic typing.

I did a quick round of shouted questions to some of my colleagues at ThoughtWorks I know and respect - and who was online on IM at the mime. The general message was that it depends on the team. The people writing the code, and how they are writing it, is much more important than static or dynamic typing. If you make the assumption that the team is good and the code is treated well from day 0, static or dynamic typing doesn’t make difference for maintainability.

Rebecca Parsons, our CTO said this:

I think right now the tooling is still better in static languages. I think the code is shorter generally speaking in dynamic languages which makes it easier to support.

I think maintenance is improved when the cognitive distance between the language and the app is reduced, which is often easier in dynamic languages.

In the end, I’m just worried that everyone seems to take the maintainability story as fact. Has there been any research done in this area? Smalltalk and Lisp has been around forever, there should be something out there about how good or bad maintenance of these systems have been. There are three reasons I haven’t seen it:

  • It’s out there, but I haven’t looked in the right places.
  • There are maintenance problems in all of these languages, but people using dynamic languages aren’t as keen on whining as Java developers.
  • There are no real maintenance problems with dynamic languages.

There is a distinct possibility I’ll get lots of anecdotal evidence in the comments on this post. I would definitely prefer fact, if there is any to get.



Ioke runs iterative fibonacci


Today Ioke actually runs both recursive and iterative fibonacci. That might not seem as much, but the work to get that has put in place much of the framework needed for the rest of the implementation.

It’s a nice milestone, since Ioke is now Turing complete (having both conditionals and iteration). Most of the neat features I’m planning aren’t actually implemented yet, though.

In the time honored tradition of language performance measuring, I decided to compare iterative fibonacci performance to Ruby.

Keep in mind that I haven’t done any optimizations whatsoever, and I do loads of really expensive stuff all over Ioke. Specifically, there is no such thing as locals - what looks like locals here are actually regular attributes of a Context object. All lookup of names are using hash tables at the moment. It’s also fully interpreted code. Nothing is being compiled at this point. I’m running all the examples on SoyLatte (Java 1.6) on a MBP. I used the JVM -server flag when running Ioke and JRuby.

The Ruby code looks like this:

require 'benchmark'

def fib_iter_ruby(n)
   i = 0
   j = 1
   cur = 1
   while cur <= n
     k = i
     i = j
     j = k + j
     cur = cur + 1
   end
   i
end

puts Benchmark.measure { fib_iter_ruby(300000) }
puts Benchmark.measure { fib_iter_ruby(300000) }

And the Ioke code looks like this. I don’t have any benchmarking libraries yet, so I measured it using time:

fib = method(n,
  i = 0
  j = 1
  cur = 1
  while(cur <= n,
    k = i
    i = j
    j = k + j
    cur++)
  i)

System ifMain(fib(300000))

And what are the results? Not surprisingly, JRuby does well on this benchmark, and would probably do even better if I ran more iterations. The JRuby (this is current trunk, btw) time for calculating fib(300000) was 7.5s. MRI (ruby 1.8.6 (2008-03-03 patchlevel 114) [i686-darwin8.10.1]) ended up at exactly 14s. So where is Ioke in all this? I’m happy to say Ioke ended up taking 9.2s. I was really pleasantly surprised by that. But I have a feeling that recursive fib might not end up with those proportions. But the indication is that I haven’t done anything amazingly expensive yet, at least. That’s a good sign, although I have no problem sacrificing performance for expressability.