New JRuby YAML support with Yecht


A while back I finally got fed up with all our minor YAML incompatibilities. As I’ve been in charge of the YAML support in JRuby for most of the time, this is something I take personally. I’ve written several YAML processors now, and I decided it was time once and for all to make sure we were totally compatible with MRI.

As it happens, the incompatibilities in JRuby’s YAML support can be divided into two categories – the first category are those things that can’t easily be done with JvYAML since they depend on internals of Syck. More and more of these started cropping up, especially for customizing serialization and loading, but also in how the parsing behavior worked and so on.

The second category are a bit more annoying. These bugs are based on invalid YAML that MRI emits or parses even though it is invalid. Syck happens to be a bit loose and nice – and it’s also a YAML 1.0 processor. JvYAML started life as a YAML 1.1 processor, and it was pretty strict. During the last year I’ve crippled JvYAML, making it more 1.0 compatible and less strict to make it closer to Syck. But at the end of the day full Syck compatibility would never be possible from within JvYAMLb.

So I started hacking on Yecht. Two weeks later it is now merged into JRuby trunk. Yecht is a proper port of Syck that matches Syck semantics more or less to the letter – including bugs. Don’t believe me? Just try “YAML::Syck::Map.new(nil, nil, nil).kind” on MRI and “YAML::Yecht::Map.new(nil, nil, nil).kind” on JRuby and see…

As it happens, the story of how I ported Syck is quite interesting, so I will write a separate post about that, focusing on some of the more impressive performance improvements I managed to squeeze out of the parser.

But the short story is this: JRuby’s YAML support is now better than ever, and much more compatible to how MRI does things. All open YAML bugs in JRuby’s bug tracker have been closed, and all tests run as they should.



RbYAML in Google Summer of Code


Great news for all Ruby implementations around. A project to bring RbYAML up-to-date and perform better has been accepted for Google Summer of Code. Long Sun is the name of the student, and me and Xue Yong Zhi will jointly mentor this effort.

In fact, I’m very excited about this news. RbYAML was an incredibly important piece of the puzzle to get JRuby to finally work with RubyGems, and that kickstarted our possibilities to start testing numerous other applications. I soon ported RbYAML to Java, and created the JvYAML and JvYAMLb projects, to get better efficiency. Sadly, this left RbYAML without any TLC. That changed a while back when Rubinius picked up the project to get their YAML support going, and now that Long Sun will work on it, hopefully we will finally get an extremely compliant and bug free YAML implementation for Ruby.

This will obviously benefit Rubinius, but it will also be very good for both JRuby and IronRuby. The work will be test-driven which means a more complete test suite will be built around YAML in Ruby.

If you’re interested in following the project, it’s now hosted at Google Code (due to problems with RubyForge from China) at http://code.google.com/p/rbyaml/. Long Sun will also blog about his progress here: http://rbyaml.blogspot.com/.

Exciting news indeed.



JvYAMLb finally released as separate project


So I’ve finally made the time to extract JvYAMLb from JRuby. That means that JvYAML is mildly deprecated. Of course, since JvYAMLb only uses ByteLists it might not be the best solution for everyone.

If you’re interested in downloading the 0.1 release, you can do it at the Google Code download site http://code.google.com/p/jvyamlb/downloads/list.



The FINAL OpenSSL post?


Possibly.

I’ve checked in all functionality I will add to OpenSSL support in JRuby at this point. Of course, there will be more, but not concentrated in a spurt like this. Tomorrow I will modify the build process and then merge everything I’ve done into trunk.

Let’s back up a little. What have I accomplished? This: All OpenSSL tests from MRI run (except PKCS#7). That includes tests of SSL server and SSL client. Simple https-request also works. This is sweet. Everything else there is tests for in Ruby works. But… this is also the problem. Roughly half of Ruby’s OpenSSL library is not tested at all. And since the current OpenSSL initiative from my part is based on tests, I haven’t done anything that isn’t tested for.

So, some things won’t work. There is no support for Diffie-Hellman keys right now, for example. Will be easy to add when the time comes, but there isn’t any testing so I haven’t felt the need.

The only thing not there, as I said, is PKCS#7. That was just too involved. I’ll take care of that some other time, when someone says they want it… Or someone else can do it? =)

So, what this boils down too is that JRuby trunk will have OpenSSL support sometime tomorrow. Hopefully it will be useful and I can get on to other JRuby things. I have a few hundred bugs I would like to fix, for example…

Oh yeah, that’s true. Tomorrow will also be YAML day. I’ll probably fix some bugs and cut a new release of JvYAML. It’s that time, the bug count is bigger than it was, and JRuby needs some fixes. So that’s the order of day for tomorrow. First OpenSSL and then YAML. Any comments on this, please mail or comment directly here.

G’night.



YAML and JRuby – the last bit


An hour ago I sent the patches to make JRuby’s YAML support completely Java-based. What I have done more specifically, is to remove RbYAML completely, and instead used the newly developed 0.2-support of JvYAML. There were a few different parts that had to be done to make this possible, especially since most of the interface to YAML was Ruby-based, and used the slow Java proxy-support to interact with JvYAML.

So, what’s involved in an operation like this? Well, first I created custom versions of the Representer and the Serializer. (I had a custom JRubyConstructor since May). These weren’t that big, mostly just delegating to the objects themselves to decide how they wanted to be serialized. And that leads me to the RubyYAML-class, which is what will get loaded when you write “require ‘yaml'” in JRuby from now on. It contains two important parts. First, the module YAML, and the singleton methods on this module, that is the main interface to YAML functionality in Ruby. This was implemented in RbYAML until now.

The next part is several implementations of the methods “taguri” and “to_yaml_node” on various classes. These methods are used to handle the dumping, and it’s really there that most of the dumping action happens. For example, the taguri method for Object says that the tag for a typical Ruby object should be “!ruby/object:#{self.class.name}”. The “to_yaml_node” for a Set says that it should be represented as a map where the values of the set are keys, and the values for these keys are null.

So, when this support gets into JRuby trunk it will mean a few things, but nothing that is really apparent for the regular JRuby user. The most important benefits of this is part performance, and part correctness. Performance will be increased since we now have Java all the way, and correctness since I have had the chance to add lots of unit tests and also to fix many bugs in the process. Also, this release makes YAML 1.0-support a reality, which means that communication with MRI will work much better from now on.

So, enjoy. If we’re lucky, it will get into the next minor release of JRuby, which probably will be here quite soon.



Announcing JvYAML 0.2.1


The last few days have been spent integrating the JvYAML dumper with JRuby, and also to make YAML support in JRuby totally implemented in Java. As a side effect I have been able to root out a few bugs in JvYAML. Enough of them to warrant a minor release, actually. So, what’s new? Working binary support, support for better handling of null types, better 1.o-support and a few hooks to make it possible to remove anchors in places where it doesn’t make sense. (Like empty sequences.)

The url is http://jvyaml.dev.java.net and I recommend everyone to upgrade.



Announcing JvYAML 0.2


I’m very pleased to announce that JvYAML 0.2 was released a few minutes ago. The new release contains all the things I’ve talked about earlier and a few extra things I felt would fit good. The important parts of this release are:

  • The Dumper – JvYAML is now a complete YAML processor, not just a loader.
  • Loading and dumping JavaBeans – This feature is necessary for most serious usage of YAML. It allows people to read configuration files right into their bean objects.
  • Loading and dumping specific implementations of mappings and sequences. Very nice if you happen to need your mapping to be a TreeMap instead of a HashMap.
  • Configuration options to allow 1.0-compatibility with regard to the ! versus !! tag prefixes.
  • The simplified interface have been substantially improved, adding several utility methods.
  • Lots and lots of bug fixes.

So, as you can see, this release is really something. I am planning on spending a few nights this week integrating it with JRuby too. And soon after that we will be able to have YAML completely in Java-land. That is great news for performance. It also makes it easier to just have one YAML implementation to fix bugs in, instead of two.

A howto? Oh, you want a guide to the new features? Hmm. Well, OK, but it really isn’t much to show. How to dump and object and get the YAML string back:

 YAML.dump(obj);

or dump directly to a file:

 YAML.dump(obj,new FileWriter("/path/to/file.yaml"));

or dump with version 1.0 instead of 1.1:

 YAML.dump(obj, YAML.options().version("1.0"));

dumping a JavaBean:

 String beanString = YAML.dump(bean);

and loading it back again:

 YAML.load(beanString);

That’s more or less it. Nothing fancy. Of course, all the different parts underneath is still there, and you can provide your own implementation of YAMLFactory to add your own specific hacks. If you want to dump your object in a special way, you can implement the YAMLNodeCreator interface, and your own object will be in charge of creating the information that should be used to represent your object.