Safe(r) monkey patching


Ruby make it possible to pretty much change anything, anywhere. This is obviously very powerful, but it’s also something that can cause a lot of pain if it’s not done in a disciplined manner. The way this is handled on most Ruby projects is by heaving clear strategies for what to change, how to name it and where to put the source file. The most basic advice is to always use modules for extensions and changes if it is at all possible. There are several good reasons for this, but the main one is that it makes it easier for someone debugging your application to find out where the code is defined.

The one absolute rule that should never be violated in a Rails or Ruby project is to modify the original source code. In the worst case, fork the project and make the changes there, but never, never, never change code in vendor/plugins or vendor/gems.

Let’s start with a simple example. Say I want to recreate the presence method I mentioned in a previous blog post. A first version make look like this:

class Object
  def presence
    return self if present?
  end
end

But if I open up IRb and get hold of this method, it’s not immediately obvious where it’s defined:

o = Object.new
p o.method(:presence)  #=> #<Method: Object#presence>

However, if I were to implement it using a module instead, like this:

module Presence
  def presence
    return self if present?
  end
end

Object.send :include, Presence

If I look at the method now, the output is a bit changed:

p o.method(:presence)  #=> #<Method: Object(Presence)#presence>

We can now see that the method actually comes from the Presence module instead of the Object class. In most Ruby projects, these kind of extensions will be namespaced, using the word extensions or ext as part of the module name. When I add the presence method to code bases, I usually put it in lib/core_ext/object/presence.rb, in a module called CoreExt::Object::Presence. All of this to make it as easy to possible to find these extensions and changes.

There are many other benefits to putting an extension like this in a module. It makes your code cleaner, more flexible, and it composes better if you happen to have conflicting definitions. You can also use modules more selectively if you want, including just adding it to selected objects if necessary.

Props to my colleague Brian Guthrie for alerting me to this useful side effect of defining extensions with modules.

There is a slight wrinkle in this scenario, specifically for adding extensions to modules. Sadly, the way the Ruby module system works, you can’t include a new module into Enumerable and have that take effect in places where Enumerable has already been mixed in. Instead you have to define the methods directly on Enumerable. The general problem looks like this:

module X
  def hello
    42
  end
end

class Foo
  include X
end

Foo.new.hello #=> 42

module Y
  def goodbye
    25
  end
end

module X
  include Y
end

Foo.new.goodbye #=> undefined method `goodbye' for #<Foo:0x129f94> (NoMethodError)

This is a bit sad, since it means extensions have to be written in two different ways, depending on where you aim to use them. The general rules still applies — you should put the extensions in well named files that are easy to find. And if you can extract the functionality to a module and then delegate to that, that is preferrable.



Comparing times and dates in Ruby


In one of the Rails projects I’m involved with, we do most of the local development against SQLite and then deploy against Oracle. This is a bit annoying for many reasons, but by far the largest cause of trouble is the handling of dates. I haven’t exactly figured out the rules, but for some reason sometimes Oracle returns DateTime in situations where SQLite returns a Date. This usually causes quite subtle problems that have effects in other parts of the application. This brings me to the small piece of advice I wanted to talk about in this column. Always make sure that you know if you are working with a Date, a Time or a DateTime, since these all have slightly different behavior, especially when it comes to comparisons.

The rule is quite simple. If you think you can have a Time object, make sure to turn it into a DateTime object before trying to compare it to a Date object. What happens otherwise? Unfunny things:

Date.today < Time.now #ArgumentError: comparison of Date with Time failed

Time.now > Date.today #true

Time.now == Date.today #false

Date.today == Time.now #nil

Date.today <=> Time.now #nil

Date.today != Time.now #true

The first time I saw some of these results, I was a bit confused. Especially the last three. But they do make a twisted kind of sense. Namely, it’s OK for the <=> operator to return nil if it can’t do a comparison between two objects. And the != in Ruby is hardcoded to return the inverse of the value returned from ==, and the since nil is a falsey value, the inverse of that becomes true.

What I wanted to mention with these things is that you should always make sure you don’t have the Date on the left hand side of a comparison. Or if you want to do a comparison, explicitly call to_date to coerce them. Finally, if you want to do date and time comparisons, I find the best behavior usually comes from coercing both sides with to_datetime before doing the comparison.



Named Scopes


One of my favorite features of Rails is named scopes. Maybe it’s because I’ve seen so much Rails code with conditions and finders spread all over the code base, but I feel that named scopes should be used almost always where you would find yourself otherwise writing a find_by or even using the conditions argument directly in consumer code. The basic rule I follow is to always use named scopes outside of models to select a specific subset of a model.

So what is a named scope? It’s really just an easier way of creating a custom finder method on your model, but it gives you some extra benefits I’ll talk about later. So if we have code like:

class Foo < ActiveRecord::Base
  def self.find_all_fluxes
    find(:all, :conditions => {:fluxes => true})    
  end
end

p Foo.find_all_fluxes

we can easily replace that with

class Foo < ActiveRecord::Base
  named_scope :find_all_fluxes, :conditions => {:fluxes => :true}
end

p Foo.find_all_fluxes

You can give named_scope any argument that find can take. So you can create named scopes specifically for ordering, specifically for including an association, etc.

The above example is fixed to always have the same conditions. But named_scope can also take arguments. You do that by instead of fixing the arguments, send in a lambda that will return the arguments to use:

def self.ordered_inbetween(from, to)
  find(:all, :conditions => {:order_date => from..to})
end

Foo.ordered_inbetween(10.days.ago, Date.today)

can become:

named_scope :ordered_inbetween, lambda {|from, to|
  {:conditions => {:order_date => from..to}}
}  

Foo.ordered_inbetween(10.days.ago, Date.today)

It’s important that you use the curly braces form of block for the lambda — if you happen to use do-end instead, you might not get the right result, since the block will bind to named_scope. The block that binds to named_scope is used to add extensions to the collections returned, and thus will not contribute to the arguments given to find. It’s a mistake I’ve done several times, and it’s very easy to make. So make sure to test your named scopes too!

So what is it that makes named scopes so useful? Several things. First, they are composable. Second, they establish an explicit interface to the model functionality. Third, you can use them on associations. There are other benefits too, but these are the main ones. Simply put, they are a cleaner solution than the alternatives.

What do I mean by composable? Well, you can call one on the result of calling another. So say that we have these named scopes:

class Person < ActiveRecord::Base
  named_scope :by_name, :order => "name ASC"
  named_scope :by_age, :order => "age DESC"
  named_scope :top_ten, :limit => 10
  named_scope :from, lambda {|country|
    {:conditions => {:country => country}}
  }
end

Then you can say:

Person.top_ten.by_name
Person.top_ten.from("Germany")
Person.top_ten.by_age.from("Germany")

I dare you to do that in a clean way by using class method finders.

I hope you have already seen what I mean by offering a clean interface. You can hide some of the implementation details inside your model, and your controller and view code will read more cleanly because of it.

The third point I made was about associations. Simply, if you have an association, you can use a named scope on that association, just as if it was the model class itself. So if we have:

class ApartmentBuilding < ActiveRecord::Base
  has_many :tenants, :class_name => "Person"
end

a = ApartmentBuilding.first
a.tenants.from("Sweden").by_age

So named scopes are great, and you should use them. Whenever you sit down to write a finder, see if you can’t express it as a named scope instead.

(Note: some of the things I write about here only concerns Rails 2. Most of the work I do is still in the old world of 2.3.)



Use presence


One of the things you quite often see in Rails code bases is code like this:

do_something if !foo.blank?

or

unless foo.blank?
  do_something
end

Sometimes it’s useful to check for blankness, but in my experience it’s much more useful to check for presence. It reads better and doesn’t deal in negations. When using blank? it’s way too easy to use complicated negations.

For some reasons it seems to have escaped people that Rails already defines the opposite of blank? as present?:

do_something if foo.present?

There is also a very common pattern that you see when working with parameters. The first iteration of it looks like this:

name = params[:name] || "Unknown"

This is actually almost always wrong, since it will accept a blank string as a name. In most cases what you really want is something like this:

name = !params[:name].blank? ? params[:name] : "Unknown"

Using our newly learnt trick, it instead becomes:

name = params[:name].present? ? params[:name] : "Unknown"

Rails 3 introduces a new method to deal with this, and you should back port it to any Rails 2 application. It’s called presence, and the definition looks like this:

def presence
  self if present?
end

With this in place, we can finally say

name = params[:name].presence || "Unknown"

These kind of style things make a huge difference in the small. Once you have idiomatic patterns like these in place in your code base, it’s easier to refactor the larger parts. So any time you reach for blank?, make sure you don’t really mean present? or even presence.



RSpec matchers and regexp comments – a possibly useful hack


A few days back, I was sitting at my client and working on a hacky spec to validate some assumptions in a very dirty data set. I wanted to figure out some limits. The basic idea was that I needed to go through an entry for every day the last 8 years. Getting these entries is potentially expensive, and the validation was based on checking that a specific value never turns up. This was quite easy, and I ended up with something like this:

require 'spec_helper'

describe "Data invariant" do
  it "holds" do
    (8*365).times do |n|
      date = n.days.ago
      calculate_token_at(date).should_not == "MAGIC TOKEN"
    end
  end
end

Here you can see that I just simply use a method to calculate the invariant, then use the “should_not ==” to find out if it’s true. Nothing fancy. The problem comes when I want to get information about a failure. Now, I could insert a print statement. That means I’d have to look at all the output until I get to the end, to see which one failed. I could also rescue all exceptions, print the offending information and then reraise. But the best solution would be to give RSpec a failure message. Now, you can definitely do this for RSpec in other matchers, but I couldn’t find a way of doing it with the == matcher. One thing I could have done, was to just write my own matcher.That also seemed inefficient. This was a throw away thing, run once and then delete.

What I ended up doing was actually quite elegant, in a very disgusting way. It works, and it might be useful for someone else, sometime. But don’t EVER do anything like this in code you will save.

require 'spec_helper'

describe "Data invariant" do
  it "holds" do
    (8*365).times do |n|
      date = n.days.ago
      calculate_token_at(date).should_not =~ /\AMAGIC TOKEN(?#Invariant failed on: #{date})\Z/
    end
  end
end

So why does this work? Well, it turns out that you can have comments in regular expressions. And you can interpolate arbitrary values into regexps, just like with strings. So I can embed the failure information in a comment in the regexp. This will only be displayed when the match fails, since RSpec by default says something like “expected MAGIC TOKEN to not match /\AMAGIC TOKEN(?#Invariant failed on: 2010-06-04)\Z/”, so you get the information necessary. The comment does not contribute to the matching in anyway. There’s another subtle point here. I haven’t used ^ and $ for anchoring the pattern. Instead I use \A and \Z. The reason is that otherwise, my regexp wouldn’t have the same behavior as comparing against a string, since ^ and $ match the beginning and end of lines too, not only the beginning and end of buffer.

Anyway, I thought I’d share this. In basically all cases, don’t do this. But it’s still a bit funny.



Patterns of method missing


One of the more dynamic features of Ruby is method_missing, a way of intercepting method calls that would cause a NoMethodError if method_missing isn’t there. This feature is by no means unique to Ruby. It exists in Smalltalk, Python, Groovy, some JavaScripts, and even most CLOS extensions have it. But Ruby being what it is, for some reason this feature seem to have more heavily used in Ruby than anywhere else. It’s also a feature most Ruby developers seem to know about. Is this because Ruby people are power hungy, crazy monkey patchers? Maybe, but method_missing is also potentially very useful, if used correctly. But of course, it’s exceedingly easy to misuse. In almost all cases you think you need method_missing, you actually don’t.

The purposes of this post is to take a look at a few ways people are using method_missing in the wild, what the consequences are and what you can do to mitigate them. I’m bound to have missed a few use cases here, so please feel free to add more in the comments.

Adding better debug information on failure

One of the most simple but still very powerful ways of using method_missing is to allow it to include more information in the error message than you would usually have got. A simple example of that could look like this:

class MyFoo
  def method_missing(method, *args, &block)
    raise NoMethodError, <<ERRORINFO
method: #{method}
args: #{args.inspect}
on: #{self.to_yaml}
ERRORINFO
  end
end

This usage is pretty common – and is in my opinion a very valid use of the functionality. The only thing you have to be careful about is to not introduce any recursive calls to method_missing. Say if you forget to require YAML in the above example – the error would be a stack overflow.

One of the places where you’ve almost certainly seen this used is in Rails, where the feature is called whiny nils. The idea is that nil will have a method missing that gives some extra information. It can guess based on the method name what object you were expecting. This could be a typical message from Rails whiny nil:

Loading development environment (Rails 2.2.2)
>> nil.last
NoMethodError: You have a nil object when you didn't expect it!
You might have expected an instance of Array.
The error occurred while evaluating nil.last
	from (irb):2

This functionality is exceedingly simple to implement, but gives you lots of leverage to find and debug your problem quicker and easier.

Encode parameters in method name

Another common pattern is to use the name of the method to encode parameters, instead of sending them in as explicit parameters. In some cases this can be used to good effect, but if possible it would be better to encode the possible names beforehand, or send in the parameters as actual parameters instead. Contrast a Rails-style find expression:

Person.find_by_name_and_age("Ola", 28)

With another way of creating the same API:

Person.find_by(:name => "Ola", :age => 28)

The difference here isn’t that large, and in the case of Rails I do think they are harmless – but creating these kinds of API’s make it much harder to debug and maintain an application, so care should be taken.

Builders

Creating XML, HTML, graphical UIs and other hierarchical data structures lend themselves very well to the builder pattern. The idea of a builder is that you use Ruby’s blocks and method_missing to make it easy to create any kind of output structure. The canonical example in Ruby is Jim Weirich’s Builder, that can be used to easily create complicated XML structures. A small example:

builder = Builder::XmlMarkup.new
xml = builder.books { |b|
  b.book :isbn => "124" do
    b.title "The Prefect"
    b.author "Alastair Reynolds"
  end

  b.book :isbn => "65565" do
    b.title "Against a Dark Background"
    b.author "Iain M Banks"
  end
}

The result of this code will be a properly formatted and escaped XML document. Most notable, all the finicky details of closing tags and escaping rules are taken care of for us.

In general, this approach is very pleasant to work with. It’s easy to test (since you don’t even have to generate the real XML to make sure it’s correct), and it works well with your existing Ruby tools. It’s also quite easy to implement a basic version of. For the fully general case you need to use a blank slate object, though.

Accessors

The inversion of the builder pattern is to use a parser that slurps in an XML document (or a YAML, database or anything else really), and then allow you to access the elements of it by using regular Ruby method calls – intercepting these calls with method missing and looking them up. A usage could look something like this:

slurper = Slurp <<XML
<books>
  <book isbn="14134">
    <title>Revelation Space</title>
    <author>Alastair Reynolds</author>
  </book>
  <book isbn="53534">
    <title>Accelerando</title>
    <author>Charles Stross</author>
  </book>
</books>
XML

puts slurper.books.book[1].author

I’m not much of a fan of this approach. In almost all cases there are better ways of doing it than using method_missing. The only valid use case for something like this would be for a throwaway really hacky oneoff thing. But in general, Ruby allows you to define methods dynamically anyway, so you can do that instead for this case.

Proxy/delegation

When you want to insert a proxy that resends method calls somewhere else, method_missing can be an easy way to get that to work. You can resend method calls to another object, you can resend to several objects, you can send method calls over the wire, to implement a crude RMI system. You can also record method calls and write them to disk. All of these can be achieved with just a few lines of code. But in many cases there are better options – especially if you want to do delegation. One of the dangers (and the power also, of course) of method_missing is that it can take any kind of method call. So if you misspell something, method_missing will happily treat it the same way.

But when delegating, you generally want to be explicit about what you delegate, to avoid this problem. There are several classes in the standard library that allow you to explicitly say what methods to delegate and where to delegate them – and if you can, try using this instead. Proxying and delegation should be explicit if possible.

Making parts of an API extensible and optional

In some cases you might want to create a base class for an API, but allow the subclasses to add additional API methods. In some cases it can make sense to ignore calls to these subclass API methods if called on something that doesn’t support it. By definition, the super class can’t actually know which API methods the subclasses might add, so it makes sense to use method_missing to open up the API and make it more convenient. This is not very common – and in most cases should probably not be done, but sometimes it can be a useful technique.

Test helpers

All kinds of test helpers can be created using method_missing. They can be used to implement factories, delegate and do all kinds of things. If you take a look at any open source Ruby project, the tests is the place where you are most likely to find implementations of method_missing. I can’t say that these implementations actually follow any specific patterns either.

Summary

Finally, remember. Method missing is a powerful powerful feature – it should not be used in almost all the cases. But if you do want to use it, don’t forget to implement responds_to? correctly. And if you’re designing your class for subclassing, it’s also important to design your method_missing usage for inheritance. Liskovs Substitution Principle applies here.