Re2j - a small lexer generator for Java


There is a tool called re2c. It’s pretty neat. Basically it allows you to intersperse a regular expression based grammar in comments inside of C code, and those comments will be transformed into a basic lexer. There are a few things that make re2c different from other similar tools. The first one is that the supported features are pretty limited (which is good). The code generated is fast. The other good part is that you can have several sections in the same source file. The productions for any specific piece of code are constrained to the specific comment.

As it happens, why the lucky stiff used re2c when he made Syck (the C-based YAML processor used in Ruby and many other languages). So when I set set out to port Syck to Java, the first problem was to figure out the best way to port the lexers using re2c. I ended up using Ragel for the implicit-scanner, and thought about doing the same for the token scanner, but Ragel is pretty painful to use for more than one main production in the same source file. The syntax is not exactly the same either, so it would add to the burden of porting the scanner if I decided to switch.

At the end of the day the most pragmatic choice was to port the output generator in re2c to generate Java instead. This turned out to be pretty easy, and the result is now used in Yecht, which was merged as the YAML processor for JRuby a few days ago.

You can find re2j in my github repository at http://github.com/olabini/re2j. This is still a C++ program, and it probably won’t compile very well on windows. But it’s good enough for many small use cases. Everything works exactly as re2c except for one small difference, namely that you can define a parameter called YYDATA that points to a byte or char buffer that should be the place to read from. For an example usage, take a look at the token scanner: http://github.com/olabini/yecht/blob/master/src/main/org/yecht/TokenScanner.re.

I haven’t put any compiled binaries out anywhere, and at some point it might be nice to merge this with the proper re2c project so you can give a flag to generate Java instead of C, but for now this is all there is to the project.



Second day of JavaOne


The second day of JavaOne ended up being not as draining as the first one, although I had lots of interesting times this day too. I’ve divided it into two blog posts - this is about what happened at JavaOne, and the next one will be about the Clojure meetup.

The first session of the day was Nick Siegers talk about using JRuby in production at Kenai. An interesting talk about some of the things that worked, and some of the things that didn’t work. A surprising number of decisions were given as fiat since they needed to use Sun products for many things.

After that Neal Ford gave a comparison between JRuby and Groovy. I don’t have much to say about this talk except it seemed that some of the things seemed to be a bit more complicated to achieve in Groovy, than in Ruby.

As it turns out, the next talk was my final talk of the day. This was Bob Lee (crazy bob) talking about references and garbage collection on the JVM. A very good talk, and I learned about how the Google Collections MapMaker actually solves some of my Ioke problems. I ended up integrating it during the evening and it works great.

The second day had fewer talks for me - but I still had a very good time and even learned some stuff. Nice.



Java in the Google Cloud event in London


Me and Chris Read will talk at an event at Skills Matter in London May 11th. We will be talking about different aspects surrounding the release of Google App Engine support for Java.

You can find the registration page here: http://skillsmatter.com/podcast/ajax-ria/java-in-the-google-cloud.



Dynamic languages on Google App Engine - an overview


As mentioned in a post a few minute ago here, Google has released App Engine support for Java. This is obviously very cool - and I’ve spent a few weeks testing several things using it. It should come as no surprise that my main goal with this investigation has been to see how dynamic languages fit in with the Java story.

The good news are these: JRuby works very well on the infrastructure. I will spend some more time in another post detailing what you have to do to get a JRuby on Rails application working on Google App Engine. In this post I’ll talk a bit about the different kind of restrictions a language implementation will run into, and what needs fixing.

Several other people has been testing languages such as Groovy, Scala, Clojure and Jython. My own experiments have been focused on JRuby and Ioke. At the moment, Ioke still doesn’t run on GAE/J, but the issue is something I hope will be fixed soon.

When looking at GAE/J, it’s important to keep in mind the security restrictions that Google has been forced to implement, to make the Java implementation totally safe for them. This includes restrictions of many kinds, and some of them might come as a bit of a surprise in some cases. One of the larger things you will notice is that some classes aren’t available - and you will get a ClassNotFoundException if you try to use them from your application. Personally, I believe that using a SecurityException when trying to load these might have been better, but this fact remains: many classes you expect will not be there.

Among the classes that are there (and the important parts of the JDK are there) there are many that will give you different kinds of security related problems too. JRuby trunk has been fixed for all these issues, so it should work without modification.

File system

GAE/J restricts quite a lot of what you can do with the file system. One of the things that surprised me was that calling methods like java.io.File#canRead on a restricted file might throw a SecurityException. Basically this means that all file access in an implementation need to wrap these calls in try-catch blocks.

In JRuby, I solved this by an approach that Ryan Brown gave us - creating a subclass of java.io.File that wraps all these method and return something reasonable. canRead should for example just return false if it gets a SecurityException.

Threads

It’s very hard to secure a thread scheduler - there are ways of screwing up things that are basically impossible to guard against. That means GAE/J does not support threads at all. You can’t create new ones, you can’t create new ThreadGroups or change most settings on these threads.

This is something that is less problematic for some languages, and more problematic for others. I know that Lift (in Scala) for example had some trouble, since it relied very heavily on actors, implemented using thread pools.

Reflection

Java’s reflection capabilities are very powerful, and most of the reflection methods throw several kinds of exceptions. On GAE/J you will have to guard most reflective accesses against SecurityException too. One of the things that many dynamic languages do is to use “setAccessible” on all methods. This will fail on some methods that Google thinks you shouldn’t have access to. Several of the methods on Object are among these.

Verification

In some cases, the bytecode verifier is a bit stricter than for other JDKs. It’s important to try out many corners of the application and see that it works correctly. Of course, if your language generates code at runtime, this is even more important. The good news is that I haven’t seen any problems at all with JRuby. The bad news is that the parser for Ioke doesn’t even load (and that is static Java code). This seems like a small problem in the verifier where a stack height of 0 causes it to fail, so hopefully it will be shortly fixed.

Class loading

One of the early problems for Clojure was some intricacies in the way GAE/J handles class loaders. One of these is that doing ClassLoader.getSystemClassLoader() caused a SecurityException.

Testing

It is not immediately obvious to me how you can test applications written for GAE/J in another language. The intuition is that you would be able to use the local development server to run tests, but many of the things above don’t work exactly the same in the local dev server, and some things are problematic locally but won’t cause any trouble on the server. One thing I’ve noticed is that JRuby doesn’t load correctly, because the dev server doesn’t actually load things from jar-files in such a way that JRuby can load several property files from the same jar-file. This issue doesn’t actually exist on the real servers.

You can use unit tests to test parts of your application, but you need to make sure to stub out all the calls to the Google APIs. This is actually kinda hard in Java, since one of the negative aspects about the GAE/J APIs is that they are built around singleton factories. It is very hard to inject new functionality there.

With JRuby, you can of course override these methods and unit test without running them. The main problem with this kind of unit testing is that it won’t give you any real security on the server - since you might still run into several kinds of security exceptions.

I ended up implementing a very small unit testing framework for sanity checking. This allows you to trigger a test run by going to a specific URL. Of course, this approach sucks.

At the end of the day, it seems the best kind of testing you can do is functional testing using something like Selenium or WebDriver. Or Twist. GAE/J allow you to have different versions of an application, so one way you could utilize this automatically is to allow your automatic test run deploy to a version called “test”, and then you can use a specific url to get the latest version. Say your app is deployed on “testgae”, you can allow your CI to test against “test.latest.testgae.appspot.com”, while the production environment is still running on “testgae.appspot.com”. It’s still not perfect, but it gives you some flexibility and a possibility to run continuous integration on the correct infrastructure.



JRuby on Rails on Google App Engine


This is the third post in a series detailing information about the newly announced Google App Engine support for Java. In this post I thought I’d go through the steps you need to take to get a JRuby on Rails application working on GAE/J, and also what kind of characteristics you should expect from your application.

You need a fairly new copy of JRuby. Most of the changes needed to JRuby was added to JRuby trunk right after the JRuby 1.2 release, so check out and build something after that. The newest Rails version works fine too.

Once you have the basic Rails app set up, there are few things you need to do. First of them is to install Warble and pluginize it, and finally generate the Warble configuration file. You do that by doing “jruby -S gem install warble”, “jruby -S warble pluginize” and then “jruby -S warble config”. The last two should be done in the root of the Rails application.

You should freeze the Rails gems too. Once you have done that, you need to go through all the files there and remove anything that isn’t necessary. As it turns out, GAE/J has a hard limit on a 1000 files, and a typical Rails application will end up with much more files then that. You can remove all of ActiveRecord, all the test directories and so on.

Since you’re on GAE/J, you won’t need ActiveRecord, so you should not load it in config/environment.rb. The next step is to modify your warble.rb file. These are the things you need to do:

First, make sure that the needed GAE/J files are included, by doing:

config.includes = FileList["appengine-web.xml", "datastore-indexes.xml"]

You should also set the parameters for how many runtimes will be started:

config.webxml.jruby.min.runtimes = 1
config.webxml.jruby.max.runtimes = 1
config.webxml.jruby.init.serial = true

The last option is available in trunk version of JRuby-rack. If you don’t have min=1 and max=1 then you need this option set, because otherwise JRuby-rack will actually start several threads to initialize the runtimes.

Finally, to be able to use newer versions of the libraries, you need to set what Java libraries are used to the empty array:

config.java_libs = []

You will add all of the jar-files later, in the lib directory.

The last configuration option that I added is something to allow Rails to use DataStore as a session store. You can see how this is done in YARBL.

I have set several options in my appengine-web.xml file. The most important ones are to turn off JMX and to set os.arch to empty:

      <property name="jruby.management.enabled" value="false" />
      <property name="os.arch" value="" />

This is all pretty self explanatory.

One thing that I still haven’t gotten to work correctly is “protect_from_forgery”, so you need to comment this out in app/controllers/application.rb.

You need to put several jar-files in the lib-directory, and you actually need to split the jruby-complete jar, since it is too large for GAE/J in itself. The first jar-file is the appengine-api.jar file. You also need a late build of jruby-rack, and finally you need the different slices of the jruby-complete jar. I use a script like this to create several different jar-files:

#!/bin/sh

rm -rf jruby-core.jar
rm -rf ruby-stdlib.jar
rm -rf tmp_unpack
mkdir tmp_unpack
cd tmp_unpack
jar xf ../jruby-complete.jar
cd ..
mkdir jruby-core
mv tmp_unpack/org jruby-core/
mv tmp_unpack/com jruby-core/
mv tmp_unpack/jline jruby-core/
mv tmp_unpack/jay jruby-core/
mv tmp_unpack/jruby jruby-core/
cd jruby-core
jar cf ../jruby-core.jar .
cd ../tmp_unpack
jar cf ../ruby-stdlib.jar .
cd ..
rm -rf jruby-core
rm -rf tmp_unpack
rm -rf jruby-complete.jar

This creates two jar-files, jruby-core.jar and ruby-stdlib.jar.

These things should more or less put everything in order for you to be able to deploy your application to App Engine.

YARBL

As part of my evaluation of the infrastructure, I created a small application called YARBL. It allows you to have blogs, and post posts in them. No support for comments or anything fancy at all really. But it can be expanded into something real. I use both BeeU and Bumble in YARBL. BeeU allow me to make sure that only logged in users that are administrators can actually post things or change the blog. This support was extremely easy to add through the Google UserService.

You can see a (hopefully) running version at http://yarubyblog.appspot.com. You can find the source code in my GitHub repository: http://github.com/olabini/yarbl.

Bumble

Bumble is a very small wrapper around DataStore, that allow you to create data models backed by Google’s DataStore. It was developed to back YARBL, so it really only supports the things needed for that application.

This is what the data model for YARBL looks like. This should give you a feeling for how you define models with Bumble. One thing to remember is that the DataStore actually allows any properties/attributes on entitites, so it fits a language like Ruby very well.

class Person
  include Bumble

  ds :given_name, :sur_name, :email
  has_many :blogs, Blog, :owner_id
end

class Blog
  include Bumble

  ds :name, :owner_id, :created_at
  belongs_to :owner, Person
  has_many :posts, :Post, :blog_id, :iorder => :created_at
end

class Post
  include Bumble

  ds :title, :content, :created_at, :blog_id
  belongs_to :blog, Blog
end

To actually use the model for something, you can do things like these:

Blog.all

Post.all({}, :limit => 15, :iorder => :created_at)

blog = Blog.get(params[:id])
posts = blog.posts

Blog.create :name => name, :owner => @person, :created_at => Time.now

Post.all.each do |p|
  p.delete!
end

Here are most of the supported methods. The implementation is incredibly small and you really can’t go wrong with it. Of course, it is not tuned at all, so it does lots of fetches it could avoid. I’m happily accepting patches! The code can be found at http://github.com/olabini/bumble.

BeeU

When working with Google’s user service, you can use BeeU - a very small framework for helping with some things. You basically get a few different helper methods. There are three different filter methods that can be used. These are assign_user, assign_admin_status and verify_admin_user. The first two will create instance variables called @user and @admin respectively. The @user variable will contain the UserService User object, and @admin will be either true or false if the user is logged in and is an administrator or not. The last one will check that the current user is an administrator. If not logged in, it will redirect to a login page, and if logged in but not administrator, it will respond with a Not Authorized. These three methods should all be used as before filters.

There is a high level method called require_admin that you can use to point out what methods should be protected with admin access. This is really all you need.

Finally, there are two methods that generate a login-URL and a logout-URL, both of these will redirect back to where you were when the URL’s were generated.

BeeU can be found in my GitHub repository: http://github.com/olabini/beeu.

Summary

Overall, JRuby on Rails works very well on the App Engine, except for some smaller details. The major ones are the startup cost and testing. As it happens, you can’t actually get GAE/J to precreate things. Instead you’ll have to let the first release take the hit of this. Now, GAE/J does a let of preverifying of bytecodes and so on, so startup is a bit more heavy than on other JDKs. One runtime takes about 20 seconds wall time to startup, so the first hit takes some time. The good news is that this used to be worse. The last few weeks, the infrastructure has gotten a lot faster, and I’m confident this will continue to improve. It is still a problematic thing though, since you can’t precreate runtimes, which means that some request will end up taking quite a bit longer than expected.

It’s interesting to note that performance is actually pretty good once it gets running. I’ve seen between 120ms to 500ms for a request, depending on how much calls to DataStore is involved on the page - these times are not bad, considering what the infrastructure needs to do. It also seems mostly limited to the data access. If I’d had time to integrate memcaching, I could probably improve these times substantially.

The one remaining stickler for me is still testing. It’s not at all obvious how to do it, and as I noted in my earlier post there are some ways around it - but they don’t really fit in the way most Rails applications are built. In fact, I have done mostly manual testing on this application, since the cost of automating it seemed to be costly.

In all, Google App Engine with JRuby on Rails, is a really compelling combination of technology. I’m looking forward to the first ThoughtWorks project with these pieces.



Java on Google App Engine


About a year ago, Google released their first beta version of App Engine - it allowed deployment and hosting of web applications. These applications were restricted to the Python language. About 5 minutes ago, Google announced that they have released a Java version of App Engine.

I have been involved in this for a few weeks - since ThoughtWorks is a Google Enterprise Partner - and it’s been a very interesting time. This post and a few others will take a closer look at what I’ve been experimenting with.

First of all, GAE/J is not based on Dalvik, as far as I can tell. It is a full Java implementation, so you compile your applications locally, using any standard JDK and then upload them. Google recommends Java 6 for this, but Java 5 works too.

The actual interface to GAE/J uses the standard Java Servlet API, so if you have something that works with it, chances are you won’t have to do many changes to your application.

Google also gives access to several different APIs, including the User service, Memcache service, Mail service, URL fetching service, Image service and DataStore service. These all give access to different pieces of the Google machinery. For me, the most interesting parts were the User service, that makes it possible to use the regular Google authentication infrastructure, and the DataStore service that makes it a snap to use Googles data storage infrastructure. For regular Java applications, you can use either JDO or JPA to interact with the DataStore, but Google also gives access to the low level APIs too.

As part of the GAE/J release, you get access to a local development server. It tries to mimic the full environment as closely as possibly. For the specific type of application Google expects most people to write, it works very well - but if you go outside of this beaten path, many things get a bit shaky. I ended up not using it very much.

So, GAE/J is a very cool platform to target cloud applications to. Obviously Python is still a valid choice too, but the combination of apps built in Python and applications running on GAE/J seems like a very powerful choice.

ThoughtWorks has recently been spending much time in this area and we have gotten some good experience with it. We look forward to be able to work with applications for Google App Engine, written in Java, or any of the other languages supported. (If you follow todays blog posts, you will see that I’m not the only ThoughtWorker who has explored alternative languages on this platform).

My esteemed colleagues have also written up their experiences with the Java pieces of Google App Engine. You can read it here: http://paulhammant.com/blog/google-app-engine-for-java-with-rich-ruby-clients.html, http://elhumidor.blogspot.com/ and http://blog.sriramnarayan.com/.



Invoke dynamic in JDK 7


First post from the JVM Language Summit. Mark Reinhold just stated that invokedynamic definitely will be in Java 7. This is obviously great news for anyone who cares about dynamic languages.



RSA parameters in OpenSSL, Ruby and Java


I would just like to publish this information somewhere, so that Google can help people find it easier than I did.  If you have ever wondered how the internal OpenSSL RSA parameters map to the Java parameters on RSAPrivateCrtKey, this little table will probably help you a bit. There are three different names in motion here. The first one is the internal field names in OpenSSL. These are also used as method names in Ruby. The second name is what gets presented when you use something like to_text on an RSA key. The third name is what it’s called in Java.

  • n == modulus == modulus
  • e == public exponent == publicExponent
  • d == private exponent == privateExponent
  • p == prime1 == primeP
  • q == prime2 == primeQ
  • dmp1 == exponent1 == primeExponentP
  • dmq1 == exponent2 == primeExponentQ
  • iqmp == coefficient == crtCoefficient


Java and mocking


I’ve just spent my first three days on a project in Leeds. It’s a pretty common Java project, RESTful services and some MVC screens. We have been using Mockito for testing which is a first for me. My immediate impression is quite good. It’s a nice tool and it allows some very clean testing of stuff that generally becomes quite messy. One of the things I like is how it uses generics and the static typing of Java to make it really easy to make mocks that are actually type checked; like this for example:

Iterator iter = mock(Iterator.class);stub(iter.hasNext()).toReturn(false);

// Call stuff that starts interaction
verify(iter).hasNext();

These are generally the only things you need to stub stuff out and verify that it was called. The things you don’t care about you don’t verify. This is pretty good for being Java, but there are some problems with it too. One of the first things I noticed I don’t like is that interactions that isn’t verified can’t be disallowed in an easy way. Optimally this would happen at the creation of the mock, instead of actually calling the verifyNoMoreInteractions() afterwards instead. It’s way to easy to forget. Another problem that quite often comes up is that you want to mock out or stub some methods but retain the original behavior of others. This doesn’t seem possible, and the alternative is to manually create a new subclass for this. Annoying.

Contrast this to testing the same interaction with Mocha, using JtestR, the difference isn’t that much, but there is some missing cruft:

iter = mock(Iterator)
iter.expects(:hasNext).returns(false)

# Call stuff that starts interaction

Ruby makes the checking of interactions happen automatically afterwards, and so you don’t have any types you don’t need to care about most stuff the way you do in Java. This also shows a few of the inconsistencies in Mockito, that is necessary because of the type system. For example, with the verify method you send the mock as argument and the return value of the verify-method is what you call the actual method on, to verify that it’s actually called. Verify is a generic method that returns the same type as the argument you give to it. But this doesn’t work for the stub method. Since it needs to return a value that you can call toReturn on, that means it can’t actually return the type of the mock, which in turn means that you need to call the method to stub before the actual stub call happens. This dichotomy gets me every time since it’s a core inconsistency in the way the library works.

Contrast that to how a Mockito like library might look for the same interaction:

iter = mock(Iterator)
stub(iter).hasNext.toReturn(false)

# Do stuff
verify(iter).hasNext

The lack of typing makes it possible to create a cleaner, more readable API. Of course, these interactions are all based on how the Java code looked. You could quite easily imagine a more free form DSL for mocking that is easier to read and write.

Conclusion? Mockito is nice, but Ruby mocking is definitely nicer. I’m wondering why the current mocking approaches doesn’t use the method call way of defining expectations and stubs though, since these are much easier to work with in Ruby.

Also, it was kinda annoying to upgrade from Mockito 1.3 to 1.4 and see half our tests starting to fail for unknown reasons. Upgrade cancelled.



JtestR 0.3 Released


JtestR allows you to test your Java code with Ruby frameworks.

Homepage: http://jtestr.codehaus.org
Download: http://dist.codehaus.org/jtestr

JtestR 0.3 is the current release of the JtestR testing tool. JtestR integrates JRuby with several Ruby frameworks to allow painless testing of Java code, using RSpec, Test/Unit, Expectations, dust and Mocha.

Features:
- Integrates with Ant, Maven and JUnit
- Includes JRuby 1.1, Test/Unit, RSpec, Expectations, dust, Mocha and ActiveSupport
- Customizes Mocha so that mocking of any Java class is possible
- Background testing server for quick startup of tests
- Automatically runs your JUnit and TestNG codebase as part of the build

Getting started: http://jtestr.codehaus.org/Getting+Started

The 0.3 release has focused on stabilizing Maven support, and adding new capabilities for JUnit integration.

New and fixed in this release:
JTESTR-47 Maven with subprojects should work intuitively
JTESTR-42 Maven dependencies should be automatically picked up by the test run
JTESTR-41 Driver jtestr from junit
JTESTR-37 Can’t expect a specific Java exception correctly
JTESTR-36 IDE integration, possibility to run single tests
JTESTR-35 Support XML output of test reports

Team:
Ola Bini - ola.bini@gmail.com
Anda Abramovici - anda.abramovici@gmail.com