Mystery: An exposé on JRuby performance


So, after Charlies awesome work yesterday (documented here), I felt it was time I put the story straight on general JRuby performance. It’s something of a mystery. I’ll begin by showing you the results of lots of different benchmarks and comparison to MRI. All these benchmarks have been run on my MacBook Pro, dual 2.33Ghz cores with 2Gb memory. JRuby revision is 4578, Java is Apple 1.6 beta, and Ruby version is 1.8.6 (2007-03-13 patchlevel 0). JRuby was ran with -J-server and -O.

Now, lets begin with the YARV benchmarks. The first values is always MRI, the second i JRuby. I’ve made the benchmark red when MRI is faster, and green when JRuby is faster.

bm_app_answer.rb:0.480000   0.010000   0.490000 (  0.511641)0.479000   0.000000   0.479000 (  0.478000)bm_app_factorial.rb:ERROR stack level too deep2.687000   0.000000   2.687000 (  2.687000)bm_app_fib.rb:5.880000   0.030000   5.910000 (  6.151324)2.687000   0.000000   2.687000 (  2.687000)bm_app_mandelbrot.rb:1.930000   0.010000   1.940000 (  1.992984)2.752000   0.000000   2.752000 (  2.876000)bm_app_pentomino.rb:84.160000   0.450000  84.610000 ( 88.205519)77.117000   0.000000  77.117000 ( 77.117000)bm_app_raise.rb:2.600000   0.430000   3.030000 (  3.169156)2.162000   0.000000   2.162000 (  2.162000)bm_app_strconcat.rb:1.390000   0.010000   1.400000 (  1.427766)1.003000   0.000000   1.003000 (  1.003000)bm_app_tak.rb:7.570000   0.060000   7.630000 (  7.888381)2.676000   0.000000   2.676000 (  2.676000)bm_app_tarai.rb:6.020000   0.030000   6.050000 (  6.186971)2.236000   0.000000   2.236000 (  2.236000)bm_loop_times.rb:4.240000   0.020000   4.260000 (  4.404826)3.354000   0.000000   3.354000 (  3.354000)bm_loop_whileloop.rb:9.450000   0.050000   9.500000 (  9.678552)5.037000   0.000000   5.037000 (  5.037000)bm_loop_whileloop2.rb:1.890000   0.010000   1.900000 (  1.936502)1.039000   0.000000   1.039000 (  1.039000)bm_so_ackermann.rb:ERROR stack level too deep4.928000   0.000000   4.928000 (  4.927000)bm_so_array.rb:5.580000   0.020000   5.600000 (  5.709101)5.552000   0.000000   5.552000 (  5.552000)bm_so_concatenate.rb:1.580000   0.040000   1.620000 (  1.647592)1.602000   0.000000   1.602000 (  1.602000)bm_so_exception.rb:4.170000   0.390000   4.560000 (  4.597234)4.683000   0.000000   4.683000 (  4.683000)bm_so_lists.rb:0.970000   0.030000   1.000000 (  1.036678)0.814000   0.000000   0.814000 (  0.814000)bm_so_matrix.rb:1.700000   0.010000   1.710000 (  1.765739)1.878000   0.000000   1.878000 (  1.879000)bm_so_nested_loop.rb:5.130000   0.020000   5.150000 (  5.258066)4.661000   0.000000   4.661000 (  4.661000)bm_so_object.rb:5.480000   0.030000   5.510000 (  5.615154)3.095000   0.000000   3.095000 (  3.095000)bm_so_random.rb:1.760000   0.010000   1.770000 (  1.806116)1.495000   0.000000   1.495000 (  1.495000)bm_so_sieve.rb:0.680000   0.010000   0.690000 (  0.705296)0.853000   0.000000   0.853000 (  0.853000)bm_vm1_block.rb:19.920000   0.110000  20.030000 ( 20.547236)12.698000   0.000000  12.698000 ( 12.698000)bm_vm1_const.rb:15.720000   0.080000  15.800000 ( 16.426734)7.654000   0.000000   7.654000 (  7.654000)bm_vm1_ensure.rb:14.530000   0.070000  14.600000 ( 15.137106)7.588000   0.000000   7.588000 (  7.589000)bm_vm1_length.rb:17.230000   0.090000  17.320000 ( 20.406438)6.415000   0.000000   6.415000 (  6.416000)bm_vm1_rescue.rb:11.520000   0.040000  11.560000 ( 11.736435)5.604000   0.000000   5.604000 (  5.603000)bm_vm1_simplereturn.rb:17.560000   0.080000  17.640000 ( 18.178065)5.413000   0.000000   5.413000 (  5.413000)bm_vm1_swap.rb:22.160000   0.110000  22.270000 ( 22.698746)6.836000   0.000000   6.836000 (  6.835000)bm_vm2_array.rb:5.600000   0.020000   5.620000 (  5.675354)1.844000   0.000000   1.844000 (  1.844000)bm_vm2_method.rb:9.800000   0.030000   9.830000 (  9.918884)5.152000   0.000000   5.152000 (  5.151000)bm_vm2_poly_method.rb:13.570000   0.050000  13.620000 ( 13.803066)10.289000   0.000000  10.289000 ( 10.289000)bm_vm2_poly_method_ov.rb:3.990000   0.010000   4.000000 (  4.071277)1.750000   0.000000   1.750000 (  1.749000)bm_vm2_proc.rb:5.670000   0.020000   5.690000 (  5.723124)3.267000   0.000000   3.267000 (  3.267000)bm_vm2_regexp.rb:0.380000   0.000000   0.380000 (  0.387671)0.961000   0.000000   0.961000 (  0.961000)bm_vm2_send.rb:3.720000   0.010000   3.730000 (  3.748266)2.135000   0.000000   2.135000 (  2.136000)bm_vm2_super.rb:4.100000   0.010000   4.110000 (  4.138355)1.781000   0.000000   1.781000 (  1.781000)bm_vm2_unif1.rb:3.320000   0.010000   3.330000 (  3.348069)1.385000   0.000000   1.385000 (  1.385000)bm_vm2_zsuper.rb:4.810000   0.020000   4.830000 (  4.856368)2.920000   0.000000   2.920000 (  2.921000)bm_vm3_thread_create_join.rb:0.000000   0.000000   0.000000 (  0.006621)0.368000   0.000000   0.368000 (  0.368000)

What is interesting about these numbers is that almost all of them are way faster, and the ones that are slower are so by a quite narrow margin (except for the regexp and thread tests).

So, this is the baseline against MRI. Let’s take a look at a few other benchmarks. These can all be found in JRuby, in the test/bench directory. When you run them, all of them generate 5 or 10 runs of all measures, but I’ve simply taken the best one for each. The repetition is to allow Java to warm up. Here are a few different benchmarks, same convention and running parameters as above:

bench_fib_recursive.rb----------------------  1.390000   0.000000   1.390000 (  1.412710)0.532000   0.000000   0.532000 (  0.532000)

bench_method_dispatch.rb------------------------Control: 1m loops accessing a local variable 100 times2.830000   0.010000   2.840000 (  2.864822)0.105000   0.000000   0.105000 (  0.105000)Test STI: 1m loops accessing a fixnum var and calling to_i 100 times10.000000   0.030000  10.030000 ( 10.100846)2.111000   0.000000   2.111000 (  2.111000)Test ruby method: 1m loops calling self's foo 100 times16.130000   0.060000  16.190000 ( 16.359876)7.971000   0.000000   7.971000 (  7.971000)

bench_method_dispatch_only.rb-----------------------------Test ruby method: 100k loops calling self's foo 100 times1.570000   0.000000   1.570000 (  1.588715)0.587000   0.000000   0.587000 (  0.587000)

bench_block_invocation.rb-------------------------1m loops yielding a fixnum 10 times to a block that just retrieves dvar2.800000   0.010000   2.810000 (  2.822425)1.194000   0.000000   1.194000 (  1.194000)1m loops yielding two fixnums 10 times to block accessing one6.550000   0.030000   6.580000 (  6.623452)2.064000   0.000000   2.064000 (  2.064000)1m loops yielding three fixnums 10 times to block accessing one7.390000   0.020000   7.410000 (  7.467841)2.120000   0.000000   2.120000 (  2.120000)1m loops yielding three fixnums 10 times to block splatting and accessing them9.250000   0.040000   9.290000 (  9.339131)2.451000   0.000000   2.451000 (  2.451000)1m loops yielding a fixnums 10 times to block with just a fixnum (no vars)1.890000   0.000000   1.890000 (  1.908501)1.278000   0.000000   1.278000 (  1.277000)1m loops calling a method with a fixnum that just returns it2.740000   0.010000   2.750000 (  2.766255)1.426000   0.000000   1.426000 (  1.426000)

bench_string_ops.rb----Measure string array sort time5.950000   0.060000   6.010000 (  6.055483)8.061000   0.000000   8.061000 (  8.061000)Measure hash put time0.390000   0.010000   0.400000 (  0.398593)0.208000   0.000000   0.208000 (  0.209000)Measure hash get time (note: not same scale as put test)1.620000   0.000000   1.620000 (  1.646155)0.740000   0.000000   0.740000 (  0.740000)Measure string == comparison time2.340000   0.010000   2.350000 (  2.368579)0.812000   0.000000   0.812000 (  0.812000)Measure string == comparison time, different last pos2.690000   0.000000   2.690000 (  2.724772)0.860000   0.000000   0.860000 (  0.860000)Measure string <=> comparison time2.340000   0.010000   2.350000 (  2.369915)0.824000   0.000000   0.824000 (  0.824000)Measure 'string'.index(fixnum) time0.790000   0.010000   0.800000 (  0.808189)1.113000   0.000000   1.113000 (  1.113000)Measure 'string'.index(str) time2.860000   0.010000   2.870000 (  2.892730)0.956000   0.000000   0.956000 (  0.956000)Measure 'string'.rindex(fixnum) time0.800000   0.000000   0.800000 (  0.817300)0.631000   0.000000   0.631000 (  0.631000)Measure 'string'.rindex(str) time12.190000   0.040000  12.230000 ( 12.310492)1.247000   0.000000   1.247000 (  1.247000)

bench_ivar_access.rb----100k * 100 ivar gets, 1 ivar0.500000   0.000000   0.500000 (  0.582682)0.340000   0.000000   0.340000 (  0.340000)100k * 100 ivar sets, 1 ivar0.700000   0.010000   0.710000 (  0.816724)0.402000   0.000000   0.402000 (  0.401000)100k * 100 attr gets, 1 ivar0.970000   0.000000   0.970000 (  0.988212)0.875000   0.000000   0.875000 (  0.874000)100k * 100 attr sets, 1 ivar1.390000   0.010000   1.400000 (  1.406535)1.114000   0.000000   1.114000 (  1.114000)100k * 100 ivar gets, 2 ivars0.490000   0.000000   0.490000 (  0.506206)0.344000   0.000000   0.344000 (  0.344000)100k * 100 ivar sets, 2 ivars0.680000   0.000000   0.680000 (  0.693064)0.388000   0.000000   0.388000 (  0.388000)100k * 100 attr gets, 2 ivars0.970000   0.000000   0.970000 (  0.989313)0.878000   0.000000   0.878000 (  0.878000)100k * 100 attr sets, 2 ivars1.400000   0.000000   1.400000 (  1.434206)1.129000   0.000000   1.129000 (  1.128000)100k * 100 ivar gets, 4 ivars0.490000   0.000000   0.490000 (  0.502097)0.340000   0.000000   0.340000 (  0.340000)100k * 100 ivar sets, 4 ivars0.690000   0.000000   0.690000 (  0.696852)0.389000   0.000000   0.389000 (  0.389000)100k * 100 attr gets, 4 ivars0.970000   0.010000   0.980000 (  0.986163)0.872000   0.000000   0.872000 (  0.872000)100k * 100 attr sets, 4 ivars1.370000   0.010000   1.380000 (  1.394921)1.128000   0.000000   1.128000 (  1.128000)100k * 100 ivar gets, 8 ivars0.500000   0.000000   0.500000 (  0.519511)0.344000   0.000000   0.344000 (  0.344000)100k * 100 ivar sets, 8 ivars0.690000   0.000000   0.690000 (  0.710896)0.389000   0.000000   0.389000 (  0.389000)100k * 100 attr gets, 8 ivars0.970000   0.000000   0.970000 (  0.987582)0.870000   0.000000   0.870000 (  0.870000)100k * 100 attr sets, 8 ivars1.380000   0.000000   1.380000 (  1.400542)1.132000   0.000000   1.132000 (  1.132000)100k * 100 ivar gets, 16 ivars0.500000   0.000000   0.500000 (  0.523690)0.342000   0.000000   0.342000 (  0.343000)100k * 100 ivar sets, 16 ivars0.680000   0.000000   0.680000 (  0.707385)0.391000   0.000000   0.391000 (  0.391000)100k * 100 attr gets, 16 ivars0.970000   0.010000   0.980000 (  1.017880)0.879000   0.000000   0.879000 (  0.879000)100k * 100 attr sets, 16 ivars1.370000   0.010000   1.380000 (  1.387713)1.128000   0.000000   1.128000 (  1.128000)

bench_for_loop.rb----100k calls to a method containing 5x a for loop over a 10-element range0.890000   0.000000   0.890000 (  0.917563)0.654000   0.000000   0.654000 (  0.654000)

All of this is really good, of course. Lots of green. And the red parts are not that bad either.

But here is the mystery. General Rails performance sucks. This test case can be ran by checking out petstore from tw-commons at RubyForge. There is a file called script/console_bench that will run the benchmarks. The two commands ran was: ruby script/console_bench production, and jruby -J-server -O script/console_bench ar_jdbc. With further ado, here are the numbers:

controller :   1.000000   0.070000   1.070000 (  1.472701)controller :   2.144000   0.000000   2.144000 (  2.144000)view       :   4.470000   0.160000   4.630000 (  4.794336)view       :   6.335000   0.000000   6.335000 (  6.335000)full action:   8.300000   0.400000   8.700000 (  9.597351)full action:  16.055000   0.000000  16.055000 ( 16.055000)

These numbers plainly stink. And we can’t find the reason for it. As you can see, most benchmarks are much better, and there is nothing we can think of that would add up to a general degradation like this. It can’t be IO since these tests use app.get directly. So what is it? There is very likely to be one or two bottlenecks causing this problem, and when we have found it, Rails will most likely blaze along. But profiling haven’t helped yet. We found a few things, but there is still stuff missing. It’s an interesting problem.

My current thesis is that either symbols or regexps are responsible. I’ll spend the day checking that.


8 Comments, Comment or Ping

  1. Piers Cawley

    If I remember correctly from last time I looked at it, Rails’s routing code leans very heavily on regular expressions for route recognition. If I had to put the blame somewhere for poor rails performance in JRuby, that would be the first place I’d look.

    October 12th, 2007

  2. Derek

    Ugh… so much for us red/green color blind folks.

    That thin colored text on the white background is hard to pick apart. :(

    October 12th, 2007

  3. Anonymous

    Hey, thanks for the benchmark numbers, wouldn’t it be easier to just tag each line like so:

    MRI: 0000.000 0000.0000 0000.0000
    JRB: 1111.111 3333.3333 9999.9999

    October 12th, 2007

  4. netghost

    Any chance it’s the rails class reloading? I would assume it is disabled, but I could see that having fairly different performance characteristics between MRI and JRuby.

    October 12th, 2007

  5. Okke

    It sounds a bit like “Look, all my unit tests succeed so my application must work but what the ****, it crashes??”

    Two possibilities imho:

    1. Granularity of the benchmarks is not representative for a real life rails case.
    2. Unforeseen behavior in rails that should have been benchmarked but did not made into the benchmarks

    Its probably a combination. Rails simply stretches ruby to every edge it can find.

    Finally, it is likely rails has been tested for performance on the MRI, not on JRuby.

    Nevertheless, great figures. And given the JRuby progress that have been made lately, I bet it’s nothing but a matter of time and you can show green benchmark results for Rails also.

    October 12th, 2007

  6. Koz

    Route regognition does depend heavily on regular expressions, but as it only happens once a request it’s unlikely to be a huge bottleneck.

    You’re running in production mode so the class-loading should be one-off.

    Your benchmark rig seems quite a bit harder than the once I’ve been using(http://p.caboo.se/106832) not sure if it’s worth trying that.

    Grab me on IRC if you find anything interesting, always keen to apply benchmark driven patches :)

    October 13th, 2007

  7. Joe Van Dyk

    Have you looked into using dtrace to profile what’s happening?

    October 13th, 2007

Reply to “Mystery: An exposé on JRuby performance”