The JVM Language Summit 2010


I’ve just come back from three days in Santa Clara, spending time with some of the brightest people in the Java world – the JVM language summit is truly a fantastic collection of great people. And I was there too…

THe goal of the JVM language summit is to collect the people who work with languages on the JVM and have them share their projects, their experiences and their networks – and let them network with the people in charge of implementing the JVM’s for different companies. This year, a lot of discussion about JSR 292 and project lambda was on the plate. The presence of hardware and VM people was also more pronounced. I counted principals for at least six different virtual machines in the audience or presenting (Hotspot, JRockit, J9, Azul, Maxine, and Monty).

Among the experienced platform and language people there, some of the notables included Kresten Krab Thorup, Joshua Bloch, Bob Lee, Neal Gafter, John Rose, Brian Goetz, Alex Buckley, Rich Hickey, Charles Nutter, Cliff Click, Doug Lea, Per Bothner and many more. A great collection of people.

As an example of the funny happenstance that can happen in this collection of people, I was sitting rebinding my Java implementations for Mac OS X – and I had remove lots of links in /usr/bin. A few minutes later the person next to me started asking some questions about my experience with Java on the Mac – and it turns out he’s the manager for the Apple JVM team. Or at one point Rich Hickey reported on a quite puzzling problem that causes bad semantics when iterating over data that doesn’t fit in memory – and Cliff Click immediately opens up his laptop, says “give me an half hour and I’ll see what I can do”.

Another funny anecdote was when Doug Lea pointed out that if you use fibonacci to test performance against yourself or others, it’s important that the implementations actually agrees about the first values of fib. Funnily enough, I saw three different implementations of the ground rule in fib during the summit – all of them different. (if n < 2 return 1, if n<=2 return n, if n < 2 return n).

There were way too many interesting presentations and discussions for me to be able to talk about all of them – instead I just wanted to give some highlights.

Charles Nutter

Charles gave a quick introduction to JRuby and Mirah, and what kind of optimizations JRuby is currently doing. He also talked about how far he’s gotten in inlining invoke dynamic calls inside of JRuby (and he’s gotten very far – it’s really cool).

Fredrik Öhrström

Fredrik is the JRockit representative on JSR 292, and way too smart. He presented a solution to how you can use method handles integrated with function types to solve many of the current problems in project lambda. A very powerful and interesting presentation.

Doug Lea

Doug spent his keynote trying (quite successfully) to concinve the room of the hegemony of fork-join as a good solution to concurrency problems. A very good and thought provoking keynote.

Josh Bloch

Last year at the JVM language summit, Josh talked about what he called “the Semantic Gap”. This year, after being beat up by some linguists, he changed the name of this concept to “Performance Anxiety”. The basic idea is that in our current infrastructure we have traded performance for predictability. Two examples from his talk about when this happens in Java was pretty interesting. He had one benchmark that consistently showed about the same numbers for the same JVM run, but differed between JVM runs. There was no undeterminism in the benchmark itself, but they benchmark times continued to oscillate between 0.7 and 0.85 depending on JVM run. Cliff Clicks explanation for this is that it is probably the compilation planner, which is a separate thread. Depending on when that thread runs the compilation strategy will be different, and makes a difference in times. And it’s really hard for the programmer to take this difference into account.

The other example is simpler (and don’y change your code because of this). In some circumstances it turns out that & is faster than && in Java, because a && will short curcuit, which means it will branch. The single ampersand will always execute both sides, which means the CPU can pipeline both of them to execute at the same time.

All the examples he shown comes down to the same thing – we can’t really reason intuitively about the performance of our language constructs anymore. Our systems have become to complex in order to support better performance, and we give up predictability to get that performance. And at the end of the day it doesn’t even matter if you go down to C or assembler – you still cannot control exactly what the CPU is doing anymore.

Kresten Krab Thorup

Kresten is the CTO of Trifork, and one of the main organizers of many of my favorite conferences (like JAOO and QCon). The last nine months he has worked on an Erlang implementation for Java, which he talked about. It seems to be a very good implementation, and he’s getting surprisingly good performance and context switching numbers. In fact, several of the ideas in Seph will be stolen from Erjang.

Rémi Forax

Rémi showed off his PHP.reboot project, implemented using JSR 292 and getting quite good performance. His JSR 292 backport seems to be really useful and I think I’ll use that to make sure Seph can run on pre Java 7 machines. Good stuff.

Rich Hickey

Rich spent some time collecting comments from people in the room of what was problematic with the JVM in its current incarnation. To start us off, he showed one piece of hilarious/horrible Clojure code. Any one wants to guess what it does?

static public Object ret1(Object ret, Object nil) {
    return ret;
}

public static int count(Object o){
    if(o instanceof Counted)
        return ((Counted) o).count();
    return countFrom(Util.ret1(o, o = null));
}

We then went on to a few other things (which you can find on the JVM Language Summit wiki). The consensus seemed to be that tail calls is really very important. Last year, it wasn’t as crucial but now that we see how powerful method handles and lambda will be, tail calls turn out to be very nice to have. Hopefully we can make that happen.

JSR 292

The JSR 292 expert group got lots of chances to work on ideas and designs for the future. Lots of interesting results came out of these discussions. Some of the more notable ones are skisses on how method handles and function types can work together, how invoke dynamic and bootstrap method can be used to implement defender methods and several other interesting ideas.

All in all it has been a fun few days, going far out in language and implementation geekiness. I hope to come back to this next year.



Life in the time of Java 7


I’m currently in the process of implementing Seph, and I’ve reached an inflection point. This point is the last responsible moment to choose what I will target with my language. Seph will definitely be a JVM language, but after that there is a range of options – some quite unlikely, some more likely. The valid choices are:

  • Target Java 1.4
  • Target Java 5/6
  • Target Java 7
  • Target Java 7 with extensions

Of these, the first options isn’t really interesting for Seph, so I’ll strike it out right now. The other three choices are however still definitely possible – and good choices. I thought I might talk a little bit about why I would choose each one of them. I haven’t made a final decision yet, so that will have to be the caveat for this post.

Before talking about the different choices, I wanted to mention a few things about Seph that matters to this decision. The first one is that I want Seph to be useful in the real world. That means it should be reasonably fast, and runnable for people without too much friction. I want the implementation to be small and clean, and hopefully as DRY as possible – if I end up with both and interpreter and just-in-time compiler, I want to be able to share as much of these implementations as possible.

Java 5/6

The easiest way to go forward would be to only use Java 5 or 6. This would mean no extranice features, but it would also mean the barrier to entry would be very low. It would mean development on Seph would be much easier and wouldd in general make everything simpler for everyone. The problem with it would mainly be implementation complexity and speed, which would both suffer compared to any of the Java 7 variants.

Java 7

There are many good reasons to go with Java 7, but there are also some horrible consequences of doing this. For Seph, the things that would make things from Java 7 is method handles, invoke dynamic and defender methods. Other things would be nice, but the three previous ones are the killer features for Seph. Method handles make it possible to write much more succinct code, not generate lots of extra classes for each built in method, and many other things. It also becomes possible to refer to compiled code using method handles, so the connection between the JIT and the interpreter would be much nicer to represent.

Invoke dynamic is quite obvious – it would allow me to do much nicer compilation to bytecode, and much faster. However, I could still build the same thing myself, to much greater cost and it would also mean inlining wouldn’t be as easy to get.

Finally, defender methods is a feature of the new lambda proposal that allow you to add new methods to interfaces without breaking backwards compatibility. The way this works is that when you add a new method to an interface, you can specify a static method that should be called when that interface method is invoked and there are no other implementations on the concrete classes for a specific object. But the interesting side effect of this feature is that you can also use it to specify default implementations for the core language methods without depending on a shared base class. This will make the implementation much smaller and more flexible, and might also be useful to specify required and optional methods in an API.

The main problem with Java 7 is that it doesn’t exist yet, and the time schedule is uncertain. It is not entirely certain exactly what the design of the things will look like either – so it’s definitely a moving target. Finally, it will make it very hard for people to help out on the project, and also it won’t make Seph a possible language for people to use until they upgrade to Java 7.

Java 7 with extensions

It turns out that the interesting features coming in Java 7 is just the tip of the iceberg. There are many other proposed features, with partial implementations in the DaVinci project (MLVM). These features aren’t actually complete, but one way of forcing them to become more complete is to actually use them for something real and give lots of feedback on the feature. Some of the more interesting features:

Interface injection

This feature will allow you to say after the fact that a specific class implements an interface, and also specify implementations for the methods on that interface. This is very powerful and would be extremely helpful in certain parts of the language implementation – especially when doing integration with Java. The patch is currently not very complete, though.

Tail calls

Allowing the JVM to perform proper tail calls would make it much easier to implement many recursive functional algorithms easily. Since Seph will have proper tail calls in the language, this will mean that I will have to implement this myself if the JVM doesn’t do it, which means Seph will be slower based on this. The patch seems to be quite good and possible to merge and harden to the JDK at some point. Of all the things on this list, this seems to be one of things that we can actually envision see being added in the Java 7 or Java 8 time frame.

Coroutines/continuations

Both coroutines and continuations seem to be possible to do in a good way, at least partially. Coroutines might be interesting for Seph as an alternative to Kilim, but right now it seems to be a bit unstable. Continuations would allow me to expose continuations as a first class citizen which is never bad – but it wouldn’t give me much more than that.

Hotswapping

Hotswapping of code would make it possible to do agressive JITting and then backing out from that when guards fail and so on. This is less interesting when we have invoke dynamic, but will give some more flexibility in terms of code generation.

Fixnums, tuples, value types

We all want ways of making numbers faster – but these features might also make it possible to efficiently represent simple composite data structures, and also things like multiple return values. These are fairly simple features, but have no real patch right now (I think).

Light weight code loading (anonymous classes)

It is horrible to load byte code at runtime in Java at this point. The reason is that to be able to make sure your loaded code gets garbage collected, you will have to load each chunk of code in a new class in a new classloader. This becomes very expensive very fast, and also endangers permgen. Anonymous classes make this go away, since they don’t have names. This means you don’t actually have to keep a reference to older classes, since there is no way to get to them again if you lost the reference to them. This is a good thing, and makes it possible to not generate class loaders every time you load new code. THe state of this seems to be quite stable, but at this point JVM dependent.

The price

Of course, all of these lovely features comes with a price. Two prices in fact. The first price is that all the above features are incomplete, ranging from working patches to proof of concepts or sketches of ideas. That means that the ground will change under any language using it – which introduces hard version dependencies and complicates building. The other price is that none of these features are part of anything that has been released, and there are no guarantees that it will ever be merged in Java at any point. So the only viable way of distributing Seph would be to distribute standard build files with a patched OpenJDK so that anyone can download and use that specific JDK. But that limits interoperability and causes lots of other problems.

Somewhere in between

My current thinking is that all of the above choices are bad. For Seph I want something inbetween, and my current best approach looks like this. You will need a new build of MLVM with invoke dynamic and method handles to develop and compile Seph. I will utilize invoke dynamic and method handles in the implementation, and allow people to use Rémi Forax’ JSR 292 backport to run it on Java 5 and 6. When Java 7 finally arrives, Seph will be more or less ready for it – and Seph can get some of the performance and maintainability benefits of using JSR 292 immediately. At this point I can’t actually use defender methods, but if anyone is clever enough to figure out a backport that will allow defender methods to work on Java 5 or 6, I would definitely use them all over the place.

This doesn’t actually preclude the possibility of creating alternative research versions of Seph that uses some of the other MLVM patches. Charles Nutter have shown how much you can do by using flags to add features that are turned off by default. So Seph could definitely grow the above features, but currently I won’t make the core of the language depend on them.



Invoke dynamic in JDK 7


First post from the JVM Language Summit. Mark Reinhold just stated that invokedynamic definitely will be in Java 7. This is obviously great news for anyone who cares about dynamic languages.