Safe(r) monkey patching


Ruby make it possible to pretty much change anything, anywhere. This is obviously very powerful, but it’s also something that can cause a lot of pain if it’s not done in a disciplined manner. The way this is handled on most Ruby projects is by heaving clear strategies for what to change, how to name it and where to put the source file. The most basic advice is to always use modules for extensions and changes if it is at all possible. There are several good reasons for this, but the main one is that it makes it easier for someone debugging your application to find out where the code is defined.

The one absolute rule that should never be violated in a Rails or Ruby project is to modify the original source code. In the worst case, fork the project and make the changes there, but never, never, never change code in vendor/plugins or vendor/gems.

Let’s start with a simple example. Say I want to recreate the presence method I mentioned in a previous blog post. A first version make look like this:

class Object
  def presence
    return self if present?
  end
end

But if I open up IRb and get hold of this method, it’s not immediately obvious where it’s defined:

o = Object.new
p o.method(:presence)  #=> #<Method: Object#presence>

However, if I were to implement it using a module instead, like this:

module Presence
  def presence
    return self if present?
  end
end

Object.send :include, Presence

If I look at the method now, the output is a bit changed:

p o.method(:presence)  #=> #<Method: Object(Presence)#presence>

We can now see that the method actually comes from the Presence module instead of the Object class. In most Ruby projects, these kind of extensions will be namespaced, using the word extensions or ext as part of the module name. When I add the presence method to code bases, I usually put it in lib/core_ext/object/presence.rb, in a module called CoreExt::Object::Presence. All of this to make it as easy to possible to find these extensions and changes.

There are many other benefits to putting an extension like this in a module. It makes your code cleaner, more flexible, and it composes better if you happen to have conflicting definitions. You can also use modules more selectively if you want, including just adding it to selected objects if necessary.

Props to my colleague Brian Guthrie for alerting me to this useful side effect of defining extensions with modules.

There is a slight wrinkle in this scenario, specifically for adding extensions to modules. Sadly, the way the Ruby module system works, you can’t include a new module into Enumerable and have that take effect in places where Enumerable has already been mixed in. Instead you have to define the methods directly on Enumerable. The general problem looks like this:

module X
  def hello
    42
  end
end

class Foo
  include X
end

Foo.new.hello #=> 42

module Y
  def goodbye
    25
  end
end

module X
  include Y
end

Foo.new.goodbye #=> undefined method `goodbye' for #<Foo:0x129f94> (NoMethodError)

This is a bit sad, since it means extensions have to be written in two different ways, depending on where you aim to use them. The general rules still applies — you should put the extensions in well named files that are easy to find. And if you can extract the functionality to a module and then delegate to that, that is preferrable.



Patterns of method missing


One of the more dynamic features of Ruby is method_missing, a way of intercepting method calls that would cause a NoMethodError if method_missing isn’t there. This feature is by no means unique to Ruby. It exists in Smalltalk, Python, Groovy, some JavaScripts, and even most CLOS extensions have it. But Ruby being what it is, for some reason this feature seem to have more heavily used in Ruby than anywhere else. It’s also a feature most Ruby developers seem to know about. Is this because Ruby people are power hungy, crazy monkey patchers? Maybe, but method_missing is also potentially very useful, if used correctly. But of course, it’s exceedingly easy to misuse. In almost all cases you think you need method_missing, you actually don’t.

The purposes of this post is to take a look at a few ways people are using method_missing in the wild, what the consequences are and what you can do to mitigate them. I’m bound to have missed a few use cases here, so please feel free to add more in the comments.

Adding better debug information on failure

One of the most simple but still very powerful ways of using method_missing is to allow it to include more information in the error message than you would usually have got. A simple example of that could look like this:

class MyFoo
  def method_missing(method, *args, &block)
    raise NoMethodError, <<ERRORINFO
method: #{method}
args: #{args.inspect}
on: #{self.to_yaml}
ERRORINFO
  end
end

This usage is pretty common – and is in my opinion a very valid use of the functionality. The only thing you have to be careful about is to not introduce any recursive calls to method_missing. Say if you forget to require YAML in the above example – the error would be a stack overflow.

One of the places where you’ve almost certainly seen this used is in Rails, where the feature is called whiny nils. The idea is that nil will have a method missing that gives some extra information. It can guess based on the method name what object you were expecting. This could be a typical message from Rails whiny nil:

Loading development environment (Rails 2.2.2)
>> nil.last
NoMethodError: You have a nil object when you didn't expect it!
You might have expected an instance of Array.
The error occurred while evaluating nil.last
	from (irb):2

This functionality is exceedingly simple to implement, but gives you lots of leverage to find and debug your problem quicker and easier.

Encode parameters in method name

Another common pattern is to use the name of the method to encode parameters, instead of sending them in as explicit parameters. In some cases this can be used to good effect, but if possible it would be better to encode the possible names beforehand, or send in the parameters as actual parameters instead. Contrast a Rails-style find expression:

Person.find_by_name_and_age("Ola", 28)

With another way of creating the same API:

Person.find_by(:name => "Ola", :age => 28)

The difference here isn’t that large, and in the case of Rails I do think they are harmless – but creating these kinds of API’s make it much harder to debug and maintain an application, so care should be taken.

Builders

Creating XML, HTML, graphical UIs and other hierarchical data structures lend themselves very well to the builder pattern. The idea of a builder is that you use Ruby’s blocks and method_missing to make it easy to create any kind of output structure. The canonical example in Ruby is Jim Weirich’s Builder, that can be used to easily create complicated XML structures. A small example:

builder = Builder::XmlMarkup.new
xml = builder.books { |b|
  b.book :isbn => "124" do
    b.title "The Prefect"
    b.author "Alastair Reynolds"
  end

  b.book :isbn => "65565" do
    b.title "Against a Dark Background"
    b.author "Iain M Banks"
  end
}

The result of this code will be a properly formatted and escaped XML document. Most notable, all the finicky details of closing tags and escaping rules are taken care of for us.

In general, this approach is very pleasant to work with. It’s easy to test (since you don’t even have to generate the real XML to make sure it’s correct), and it works well with your existing Ruby tools. It’s also quite easy to implement a basic version of. For the fully general case you need to use a blank slate object, though.

Accessors

The inversion of the builder pattern is to use a parser that slurps in an XML document (or a YAML, database or anything else really), and then allow you to access the elements of it by using regular Ruby method calls – intercepting these calls with method missing and looking them up. A usage could look something like this:

slurper = Slurp <<XML
<books>
  <book isbn="14134">
    <title>Revelation Space</title>
    <author>Alastair Reynolds</author>
  </book>
  <book isbn="53534">
    <title>Accelerando</title>
    <author>Charles Stross</author>
  </book>
</books>
XML

puts slurper.books.book[1].author

I’m not much of a fan of this approach. In almost all cases there are better ways of doing it than using method_missing. The only valid use case for something like this would be for a throwaway really hacky oneoff thing. But in general, Ruby allows you to define methods dynamically anyway, so you can do that instead for this case.

Proxy/delegation

When you want to insert a proxy that resends method calls somewhere else, method_missing can be an easy way to get that to work. You can resend method calls to another object, you can resend to several objects, you can send method calls over the wire, to implement a crude RMI system. You can also record method calls and write them to disk. All of these can be achieved with just a few lines of code. But in many cases there are better options – especially if you want to do delegation. One of the dangers (and the power also, of course) of method_missing is that it can take any kind of method call. So if you misspell something, method_missing will happily treat it the same way.

But when delegating, you generally want to be explicit about what you delegate, to avoid this problem. There are several classes in the standard library that allow you to explicitly say what methods to delegate and where to delegate them – and if you can, try using this instead. Proxying and delegation should be explicit if possible.

Making parts of an API extensible and optional

In some cases you might want to create a base class for an API, but allow the subclasses to add additional API methods. In some cases it can make sense to ignore calls to these subclass API methods if called on something that doesn’t support it. By definition, the super class can’t actually know which API methods the subclasses might add, so it makes sense to use method_missing to open up the API and make it more convenient. This is not very common – and in most cases should probably not be done, but sometimes it can be a useful technique.

Test helpers

All kinds of test helpers can be created using method_missing. They can be used to implement factories, delegate and do all kinds of things. If you take a look at any open source Ruby project, the tests is the place where you are most likely to find implementations of method_missing. I can’t say that these implementations actually follow any specific patterns either.

Summary

Finally, remember. Method missing is a powerful powerful feature – it should not be used in almost all the cases. But if you do want to use it, don’t forget to implement responds_to? correctly. And if you’re designing your class for subclassing, it’s also important to design your method_missing usage for inheritance. Liskovs Substitution Principle applies here.



Ioke


I’m sitting here at JAOO, waiting for the second day to start. The first presentation will be a keynote by Lars Bak about V8. It was a quite language heavy event yesterday too, with both Anders Hejlsberg and Erik Meijer keynoting about languages – plus there were introductions to both Fortress and Scala going on. And after the JVM language summit last week, I feel like the world is finally starting to notice the importance of programming languages.

So it seems only fitting that I’ve decided to go public with Ioke, the source code, what it is and where it’s going.

Ioke is a strongly typed, extremely dynamic, prototype based object oriented language. It’s homoiconic and got built in support for several kinds of macros. The languages that most closely influence Ioke is Io, Smalltalk, Self, Ruby and Lisp (Specifically Common Lisp).

The language is currently built on top of the JVM, but I’m currently considering compiling it down to JavaScript and run it on V8.

I have several goals with the language but the most specific one is to create a language that combines the things I like about Ruby and Lisp together. It turns out that Io already has many of the features I’m looking for, but in some cases doesn’t go far enough. I also wanted to have a language that is very well suited to express internal DSLs. I want to have a language that doesn’t get in my way, but also gives me loads of power to accomplish what I want. To that event I’ve designed a macro system that some people will probably find insane.

The current status of the implementation is that there isn’t any. I’m starting from scratch. I’ve already created two partial implementations to find the right way to implement the language, so with this blog post I’m starting the implementation from scratch. I know quite well what I want the language to look like and how it should work.

I’ve used Scala for the other two implementations but have decided to not do that for this implementation. The reason being one that Charles Nutter often talks about – that having to include the Scala runtime in the runtime for Ioke seems very inconvenient. So the implementation will initially use Java, but I’m aiming for the language to be self hosting as quickly as possible. That includes creating an Ioke Antlr backend, so it will take some time.

I’m going to post about Ioke quite regularly while I’m working on it, talking about design decisions and other things related to it. I will try to base me decisions in Ioke on what seems right, and not necessarily on the words chosen for representation in other language. I’ll try to talk about my reasoning behind choices like this.

And what about performance? Well, I know already that it will be atrocious. If you want to do scientific computing, maybe Ioke won’t be for you. The current design of the language will make it fairly hard to do any kinds of performance tunings, but I do have a plan for how to compile it down to bytecode at least. This still doesn’t mean it will perform extremely well, but my goals for Ioke specifically doesn’t include performance. I care about performance sometimes, but sometimes I don’t and Ioke is a tool I want to have for those cases where raw expressiveness power is what is most important.

You can follow the development in my git repository at http://github.com/olabini/ioke.



Evil hook methods?


I have come to realize that there are a few hook methods I really don’t like in Ruby. Or actually, it’s not the hook methods I have a problem with – it’s the way much code is written using them. The specific hooks that seems to cause the most trouble for me is included, extended, append_features and extend_features. Let me first reiterate – I don’t dislike the methods per se. The power they give the language is incredible and should not be underestimated. What I dislike is the way it makes things un-obvious when reading code that depends on them.

Let’s take a silly example:

module Ruby;end

module Slippers
  def self.included(other)
    other.send :include, Ruby
  end
end

class Judy
  include Slippers
end

p Judy.included_modules

Since all this code is in the same place, you can see what will happen when someone include Slippers. And really, in this case the side effect isn’t entirely dire. But imagine that this was part of a slightly larger code base. Like for example Rails. And the modules were spread over the code base. And the included hook did a few more things with your class. No way of knowing what – except reading the code – and the Ruby idiom is that include will add some methods and constants to your class and that is it. Anything else is going outside what the core message of that statement is.

One of the most common things you see with the included hook is something like this:

module Slippers
  module ClassMethods
  end

  def self.included(other)
    other.send :extend, ClassMethods
  end
end

class Judy
  include Slippers
end

This will add some class methods to the class that includes this module. DataMapper does this in the public API, for example. It’s very neat, you only have to include one thing and you get stuff on both your instances and your classes. Except that’s not what include does. Not really. So say you’re debugging someone’s code and happen upon an include statement. You generally don’t check what it’s doing until you’ve exhausted most other options.

So what’s wrong with having a public API like this?

module Slippers
  module ClassMethods;end
end

class Judy
  include Slippers
  extend Slippers::ClassMethods
end

where you explicitly include the Slipper module and then extend the class methods. This is more obvious code, it doesn’t hide anything behind your expectations, and it also might give me the possibility to choose. What if I want most of the DataMapper instance methods, but really don’t want to have finders on my class? Maybe I want to have a repository pattern? In that case I’ll have to explicitly remove all class methods introduced by that include, because there is no way of choosing if I want to have the class methods or not.

So that’s another benefit of dividing the extending out from the included hook. And finally, what about all those other things that people do in those hooks? Well, you don’t really need it. Make it part of the public API too! Instead of this:

module Slippers
  def self.included(other)
    do_funky_madness_on other
  end
end

class Judy
  include Slippers
end

make it explicit, like this:

module Slippers;end

class Judy
  include Slippers
  Slippers.do_funky_madness_on self
end

This is really just good design. It makes the functionality explicit, it makes it possible for the user to choose what he wants without doing monkey patching. And it makes the code easier to read. Yeah, I know, this will mean more lines of code. Booo hooo! I know that Ruby people are generally obsessed with making their libraries as easy to use as possible, but it feels like it sometimes goes totally absurd and people stop thinking about readability, maintainability and all those other things. And really, Ruby is such a good language that a few more lines of code like this still won’t make a huge impact on the total lines of code.

I’m not saying I haven’t done this, of course. But hopefully I’ll get better at it. And I’m not saying not to use these methods at all – I’m just saying that you should use them with caution and taste.



Dynamically created methods in Ruby


There seems to be some confusion with regards to dynamically defining methods in Ruby. I thought I’d take a look at the three available methods for doing this and just quickly note why you’d use one method in favor of another.

Let’s begin by a quick enumeration of the available ways of defining a method after the fact:

  • Using a def
  • Using define_method
  • Using def inside of an eval

There are several things to consider when you dynamically define a method in Ruby. Most importantly you need to consider performance, memory leaks and lexical closure. So, the first, and simplest way of defining a method after the fact is def. You can do a def basically anywhere, but it needs to be qualified if you’re not immediately in the context of a module-like object. So say that you want to create a method that returns a lazily initialized value, you can do it like this:

class Obj
def something
puts "calling simple"
@abc = 3*42
def something
puts "calling memoized"
@abc
end
something
end
end

o = Obj.new
o.something
o.something
o.something

As you can see, we can use the def keyword inside of any context. Something that bites most Ruby programmers at least once – and more than once if they used to be Scheme programmers – is that the second def of “something” will not do a lexically scoped definition inside the scope of the first “something” method. Instead it will define a “something” method on the metaclass of the currently executing self. This means that in the example of the local variable “o”, the first call to “something” will first calculate the value and then define a new “something” method on the metaclass of the “o” local variable. This pattern can be highly useful.

Another variation is quite common. In this case you define a new method on a specific object, without that object being the self. The syntax is simple:

def o.something
puts "singleton method"
end

This is deceptively simple, but also powerful. It will define a new method on the metaclass of the “o” local variable, constant, or result of method call. You can use the same syntax for defining class methods:

def String.something
puts "also singleton method"
end

And in fact, this does exactly the same thing, since String is an instance of the Class class, this will define a method “something” on the metaclass of the String object. There are two other idioms you will see. The first one:

class << o
def something
puts "another singleton method"
end
end

does exactly the same thing as

def o.something
puts "another singleton method"
end

This idiom is generally preferred in two cases – first, when defining on the metaclass of self. In this case, using this syntax makes what is happening much more explicit. The other common usage of this idiom is when you’re defining more than one singleton method. In that case this syntax provide a nice grouping.

The final way of defining methods with def is using module_eval. The main difference here is that module_eval allows you to define new instance methods for a module like object:

String.module_eval do
def something
puts "instance method something"
end
end

"foo".something

This syntax is more or less equivalent to using the module or class keyword, but the difference is that you can send in a block which gives you some more flexibility. For example, say that you want to define the same method on three different classes. The idiomatic way of doing it would be to define a new module and include that in all the classes. But another alternative would be doing it like this:

block = proc do
def something
puts "Shared something definition"
end
end

String.module_eval &block
Hash.module_eval &block
Binding.module_eval &block

The method class_eval is an alias for module_eval – it does exactly the same thing.

OK, so now you know when the def method can be used. Some important notes about it to remember is this: def does _not_ use any enclosing scope. The method defined by def will not be a lexical closure, which means that you can only use instance variables from the enclosing running environment, and even those will be the instance variables of the object executing the method, not the object defining the method. My main rule is this: use def whenever you can. If you don’t need lexical closures or a dynamically defined name, def should be your default option. The reason: performance. All the other versions are much harder – and in some cases impossible – for the runtimes to improve. In JRuby, using def instead of define_method will give you a large performance boost. The difference isn’t that large with MRI, but that is because MRI doesn’t really optimize the performance of general def either, so you get bad performance for both.

Use def unless you can’t.

The next version is define_method. It’s just a regular method that takes a block that defines that implementation of the method. There are some drawbacks to using define_method – the largest is probably that the defined method can’t use blocks, although this is fixed in 1.9. Define_method gives you two important benefits, though. You can use a name that you only know at runtime, and since the method definition is a block this means that it’s a closure. That means you can do something like this:

class Obj
def something
puts "calling simple"
abc = 3*42
(class <<self; self; end).send :define_method, :something do
puts "calling memoized"
abc
end
something
end
end

o = Obj.new
o.something
o.something
o.something

OK, let this code sample sink in for a while. It’s actually several things rolled into one. They are all necessary though. First, note that abc is no longer an instance variable. It’s instead a local variable to the first “something” method. Secondly, the funky looking thing(class <<self; self; end) is the easiest way to get the metaclass of the current object. Unlike def, define_method will not implicitly define something on the metaclass if you don’t specify where to put it. Instead you need to do it manually, so the syntax to get the metaclass is necessary. Third, define_method happens to be a private method on Module, so we need to use send to get around this. But wait, why don’t we just open up the metaclass and call define_method inside of that? Like this:

class Obj
def something
puts "calling simple"
abc = 3*42
class << self
define_method :something do
puts "calling memoized"
abc
end
end
something
end
end

o = Obj.new
o.something
o.something
o.something

Well, it’s a good thought. The problem is that it won’t work. See, there are a few keywords in Ruby that kills lexical closure. The class, module and def keywords are the most obvious ones. So, the reference to abc inside of the define_method block will actually not be a lexical closure to the abc defined outside, but instead actually cause a runtime error since there is no such local variable in scope. This means that using define_method in this way is a bit cumbersome in places, but there are situations where you really need it.

The second feature of define_method is less interesting – it allows you to have any name for the method you define, including something random you come up with at runtime. This can be useful too, of course.

Let’s summarize. The method define_method is a private method so it’s a bit problematic to call, but it allows you to define methods that are real closures, thus providing some needed functionality. You can use whatever name you want for the method, but this shouldn’t be the deciding reason to use it.

There are two problems with define_method. The first one is performance. It’s extremely hard to generally optimize the performance of invocation of a define_method method. Specifically, define_method invocations will usually be a bit slower than activating a block, since define_method also needs to change the self for the block in question. Since it’s a closure it is harder to optimize for other reasons too, namely we can never be exactly sure about what local variables are referred to inside of the block. We can of course guess and hope and do optimistic improvements based on that, but you can never get define_method invocations are fast as invoking a regular Ruby method.

Since the block sent to define_method is a closure, it means it might be a potential memory leak, as I documented in an older blog post. It’s important to note that most Ruby implementations keep around the original self of the block definition, as well as the lexical context, even though the original self is never accessible inside the block, and thus shouldn’t be part of the closed environment. Basically, this means that methods defined with define_method could potentially leak much more than you’d expect.

The final way of defining a method dynamically in Ruby is using def or define_method inside of an eval. There are actually interesting reasons for doing both. In the first case, doing a def inside of an eval allows you to dynamically determine the name of the method, it allows you to insert any code before or after the actual functioning code, and most importantly, defining a method with def inside of eval will usually have all the same performance characteristics as a regular def method. This applies for invocation of the method, not definition of it. Obviously eval is slower than just using def directly. The reason that def inside of an eval can be made fast is that at runtime it will be represented in exactly the same way as a regular def-method. There is no real difference as far as the Ruby runtime sees it. In fact, if you want to, you can model the whole Ruby file as running inside of an eval. Not much difference there. In particular, JRuby will JIT compile the method if it’s defined like that. And actually, this is exactly how Rails handles potentially slow code that needs to be dynamically defined. Take a look at the rendering of compiled views in ActionPack, or the route recognition. Both of these places uses this trick, for good reasons.

The other one I haven’t actually seen, and to be fair I just made it up. =) That’s using define_method inside of an eval. The one thing you would gain from doing such a thing is that you have perfect control over the closure inside of the method defined. That means you could do something like this:

class BinderCreator
def get
abc = 123
binding
end
end

eval(<<EV, BinderCreator.new.get)
Object.send :define_method, :something do
abc
end
EV

In this code we create a new method “something” on Object. This method is actually a closure, but it’s an extremely controller closure since we create a specific binding where we want it, and then use that binding as the context in which the define_method runs. That means we can return the value of abc from inside of the block. This solution will have the same performance problems as regular define_method methods, but it will let you control how much you close over at least.

So what’s the lesson? Defining methods can be complicated in Ruby, and you absolutely need to know when to use which one of these variations. Try to avoid define_method unless you absolutely have to, and remember that def is available in more places than you might think.



Silly Io experiment


I have known for some time that Io is quite powerful as a language. I wanted to check if it was also suited for creating DSLs. Since I’m lazy I didn’t want to create a new DSL just for this, so I decided to appropriate the associations language from ActiveRecord.

The process I followed was this – first come up with something that looks passably nice, then try to implement it. The only requirements was that the names of the associations were saved away somewhere, once per prototype for each model. This is the syntax I came up with (well OK, I didn’t work very long on it…):

Post := IoRecord {
has many authors
belongs to blog
belongs to isp
}

Author := IoRecord {
has many blogs
has many posts
has one name
}

Actually, this looks almost readable, right? It’s all valid Io code. The question is, how do we get it to do what we want? Without further ado, here’s an implementation that gives us enough:

IoRecord := Object clone do(
init := method(
self belongings := list
self hasOnes := list
self hasManies := list
self hasFews := list
)

appender := method(msg,
blk := block(
call sender doMessage(msg) append(call message next name)
call message setNext(call message next next)
)
blk setIsActivatable(true)
)

collector := method(
meths := call argCount / 2
waiter := Object clone
for(index, 0, meths-1,
waiter setSlot(call argAt(index*2) name,
appender(call argAt(index*2+1)))
)
waiter
)

belongs := collector(
to, belongings
)

has := collector(
many, hasManies,
one, hasOnes
)

curlyBrackets := method(
current := self clone
call message setName("do")
current doMessage(call message)
)
)

Since I’m a bad person and really enjoys metaprogramming, I had to get rid of most of the duplication the first implementation contained. Let’s say it like this: the first version didn’t have the collector and appender methods. And boy do they make a difference. This is real metaprogramming. Actually, these two methods are actually macros, almost as powerful as Common Lisps. Notice that the words we send in to collector doesn’t actually get evaluated. This is one of the reasons we don’t need to use symbols – we can just take the unevaluated messages and take their name. In the appender macro we’re doing something really funky where we use setNext.

The final effect of that setNext call is that in something like “has many foobar”, Io will not try to evaluate foobar at all. In fact, even before Io tries this, we remove the foobar message from the chain and inserts the next message instead.

Oh, right, and the Io lexer actually inserts a synthetic message called “curlyBrackets” when it finds curly brackets. The brackets themselves actually act exactly like parenthesis. You can try this out in Io by changing the closing parenthesis to a closed curly bracket instead – Io doesn’t care. Your brackets doesn’t have to match up in type, just in cardinality.

Of course, this is just the beginning. Io will blow your mind.

The thing I’m finding myself missing is a good literal syntax for hashes and lists. I’m thinking about implementing something for that. Also, using atPut is starting to get annoying. I want square brackets access. Shouldn’t be too hard, since Io does the same thing with square brackets as it does with curly brackets. Almost, at least.

And btw, the Io standard library… Not sure what I think about it. I miss many things from the Ruby core library actually. Now, Io combined with the Ruby libraries, that might be fun?



Nooks and Crannies of Ruby


There are many small parts of Ruby, tips, tricks and strange things. I thought that I would write about some of the more interesting of these, since some of them are common idioms in the Ruby community. The basis for the information is as always from the Pick-axe, but how these things are used in real life comes from various places.

The splat operator

The asterisk is sometimes called the splat operator when not used for multiplication. It is used in two different places for opposite cases. When on the right hand side of an expression, it is used to convert an array into more than one right hand value. This makes splicing of lists very easy and nice to do.

a,b,c = *[1,3,2]

Second, it’s used at the left hand side to collect more than one right hand value into an arra

*a = 1,3,2

This makes no difference if you’re calling a method or assigning variables. What matters is as usual with programming languages; that there is a left hand side and a right hand side (lhs and rhs from now on):

def foo(a,*b)
p b
end

foo 1,2,3,*[4,5,6]

This is all old news, and not very exciting. It’s useful and the basis for some niceties, but nothing overwhelming. The thing that is really nice about the rhs version of the splat operator is what it does if the value it’s applied to isn’t an array. Basically, the interpreter first checks if there is a to_ary-method available. If not, it goes for the to_a method. Now, Kernel has a default to_a-method so all objects will respond to to_a. This method is deprecated to call directly, though, but if called through splat or Kernel#Array it doesn’t generate a warning. So:

a = *1

will result in the same thing as

a = 1

except for jumping through some unnecessary hoops underneath the covers. But say that you have an object that implements Enumerable and you want to do something with. Maybe transform a Hash into an array of 2-element arrays, you can do it like this:

*a = *{:a=>1,:b=>2}

Now, this still isn’t that useful. Oh, it’s slightly useful but there is a method in Hash that does this too. But say that we have a file object:

*a = *open('/etc/passwd')

Since File includes Enumerable, it also has a to_a method which creates the array by using each to iterate and collect all elements. In this case all the lines in the file.

def foo(*args)
bar(*args)
end

Camping uses the splat operator at many places, mostly with the common idiom to take any arguments offered and passing them all on as separate arguments again:

Symbols and to_proc

I hesitate to use the word neat, but I can’t really find anything that better describes the sweet, sweet combination of symbols and to_proc. I’m going to show you a small example of how it’s used before I explain this very common practice:

[1e3,/(foo)/,"abc",:hoho].collect &:to_s

Now, this code will not run without a small addition to your code base. But first of all, let’s just walk through the code. First we define a literal array that contains four elements of different type. One Float, one Regexp, a String and a Symbol. Then we call collect to make a new array out of this. But where we usually provide collect with a block, we instead see the ampersand that symbolizes that we want to turn a Proc-object into a block argument for a method. But what comes next is not a variable, but a symbol. So, what happens? Well, the ampersand checks if the value provided to it is a Proc, and if not it calls to_proc on the value in question, if such a method is defined. And how should this method look? Like this:

class Symbol
def to_proc
lambda { |o| o.send(self) }
end
end

Now, this method is nothing much. But it employs some fun trickery. It first creates a Proc by calling Kernel#lambda with a literal block. This block takes one argument, and the block calls the method send on the argument with itself as argument. As self in this case would be a symbol, and specifically the symbol :to_s in the above example, the end result is that the Proc returned will call to_proc on each object yielded to the block. So, with this explanation it’s easier to understand what the first example does. In effect it is exactly the same as

[1e3,/(foo)/,"abc",:hoho].collect {|v| v.to_s}

but without that nasty duplication of the v-argument. It’s not a big saving, but many small savings…

I recommend installing facets, which include numerous small, nice solutions like this. They can also be required separately, so if you have facets installed, just require ‘facet/symbol/to_proc’ to get this specific functionality included.

Using operators as method names

Ruby allows much more operators to be redefined than most languages. This makes some interesting tricks possible, but most importantly it can make your code radically more readable. An excellent example of this can be found in the net/ldap-library (available as ruby-net-ldap from RubyGems). Now, LDAP uses something called filters for searching, and the syntax for filters are basically prefix notation with ampersand, pipe and exclamation mark for and, or and not, respectively. Now, with the net/ldap-library you can define a combined filter like this:

include Net
f = (LDAP::Filter.eq(:cn,'*Ola*') & LDAP::Filter.eq(:mail,'*ologix*')) |
LDAP::Filter.eq(:uid,'olagus')

This defines a filter that basically says: find all entries where cn is ‘*Ola*’ and mail is ‘*ologix*’ or uid is ‘olagus’. This is very readable thanks to the infix operators, that for everyone who knows LDAP will be easy to understand.

The next example comes from Hpricot, where _why puts the slash to good use:

doc = Hpricot(open("http://redhanded.hobix.com/index.html"))
(doc/"span.entryPermalink").set("class", "newLinks")

Note how neatly doc/”span…” fits in, and it looks like XQuery, or any other path query syntax. But it’s just regular Ruby code and the slash is just method call. I’m really sad that /. isn’t allowed as a method in this way… =)

Now, ackording to the Pickaxe, all of these infix operators will be translated from arg1 op arg2 into arg1.op(arg2). But Ruby still needs to be able to parse everything. This means that most operators need to have one required argument. Trying this with a home defined *-operator will not work:

x = a *

But, an experimental syntax for importing packages in JRuby actually used this effect:

import java.util.*

This is just a simple exploatation of the fact that * is a regular method name and used like this will be parsed by Ruby like that too, which means it doesn’t need an argument. So, which operators are available for your leisure? Ackording to the Pickaxe, these are [], []=, **, !, ~, + (unary), – (unary), *, /, %, +, -, >>, <<, &, ^, |, <=, <, >, >=, <=>, ==, ===, !=, =~, !~.
Note that the method names when implementing the unary + and – is +@ and -@:

class String
def -@
swapcase
end
end

The most important thing to remember when reusing operators like this is to not overdo it. Use it where it makes sense and is natural but not elsewhere. Remember that Ruby code should follow the principle of least surprise. The above example of using unary minus to return a swapcased version of the string is probably not obvious enough to warrant its use, for example.

Using lifecycle methods to simplify daily life

Inversion of control is all the rage in the Java world right now, but using callbacks of call kinds have always been a great way to make readable and compact. The Observer pattern is used in many places, and I suspect it’s implemented without any knowledge of the pattern in most places.

Ruby contains a few callback methods and lifecycle hooks that make life that much easier for the Ruby library writer. Probably the most useful of these are Module#included. Basically, this is a method you define like this:

module Enumerable
def self.included(mod)
puts "and now Enumerable has been used by #{mod.inspect}..."
end
end

It will be called every time a module is included somewhere else.

There are other callbacks that can be useful. Module#method_added, Module#method_removed, Module#method_undefined and counterparts for Kernel with singleton prefixed. Class#inherited is interesting. Through this you can actually keep track of all direct subclasses of your class and with some metaprogramming trickery (basically writing a new inherited for each subclass that does the same thing) you can get hold of the complete tree of subclasses. If you want that for some reason. I would for example use this approach for Test::Unit, rather than iterating over ObjectSpace. But I guess that’s a matter of taste.

Class variables versus Class instance variables

This is one thing that always trips people up. Including me. Class variables are special variables that are associated with a class. They are referenced with two at-signs and a name, like @@name. So far, it’s simple. But classes are also instances of Class, which means that these instances can have regular one-at-sign instance variables. These are not the same thing. Not at all. Something like this:

class Foo
@@borg = []
@me = nil

def initialize
@me = self
Foo::add_borg
end

def self.add_borg
@@borg << @me
end
end

will result in a @@borg-list filled with nils. This is because the first @me refers to an instance variable in the Foo instance of Class; not the @me instance variable associated with an instance of the Foo-class.

Condensed lesson: Class have instance variables of themselves, these are rarely useful; they usually contribute to hard-to-find-errors. And don’t confuse them with class variables which is a totally different kind of beast.

Shortcuts: __FILE__ and ARGF

Ruby contains a myriad of shortcuts, many influenced from Perl and other invented to make it easier to write condensed programs. The regexp result globals are always good to have, but there are other that can be very useful too. Two that I like most are __FILE__ and ARGF. __FILE__ is also part of a very, very common idiom that the Pickaxe details. Combined with the global $0 it makes it easy to differ execution when a file is required, and when it’s executed. Basically, $0 contains the name of the file that has been executed. In C this would be argv[0]. __FILE__ is the full filename of the file the code can be found in. If these are the same, the current file is the one asked to execute. This is useful in many places. I use it often in gemspecs:

if $0 == __FILE__
Gem::manage_gems
Gem::Builder.new(spec).build
end

If I run the file above with gem build, this part will not execute, but if I execute the file directly, it will run.

Matz sometimes likes to show how to implement the UNIX utility cat in Ruby:

puts *ARGF

This combines tip number uno in this blog entry with the constant ARGF. ARGF is a nice special object that when you reference it will open all the files named in ARGV. If you have any options in your ARGV you’d better remove them before referencing ARGF, though. Basically what you get when referencing ARGF is a file handle to the files named on the command line. And since a File has Enumerable and thus to_a, splat will read all the lines in all the files and combine them into an array and then splay the array into the call to puts which will print each line. Here you are, cat!

There are other globals and constants available, but most aren’t as useful as the previously named. For example you can use __END__ on an empty line, and the code interpolation will stop there and the rest of the file will be available as the constant DATA. I haven’t seen anyone use this. It’s a remnant from when Ruby was a tool to replace Perl, and the other scripting tools in UNIX.

Everything is runtime

Basically, the whole difference in Ruby compared to compiled languages is that everything happens at runtime. Actually, this difference can be seen when looking at Lisp too. In Common Lisp there are three different times when code can be evaluated: at compile-time, load-time and eval-time. In Java class-structure is fixed. You can’t change class structure based on compile parameters (oh boy, sometimes I miss C-style macros). But in Ruby, everything is runtime. Everything happens at that time (except for constants… this is a different story). This means that class definitions can be customized based on environment. A typical example is this:

class Foo
include Tracing if $DEBUG
end

This class will include some methods when the -d flag is provided, and others when it’s not. Basically there isn’t much syntax in Ruby that couldn’t be implemented in the language itself. A class declaration can be be duplicated with

Class.new(:name) do
#class declarations go here
end

And almost all parts of a method-definition with def can be provided with define_method. The glaring mismatch (blocks) will be corrected with 1.9. Except for that, it’s just sugar. If statements could be implemented with duck typing/polymorphism:

class TrueClass
def if(t,f)
t.call
end
end

class FalseClass
def if(t,f)
f.call if f
end
end

x = true

x.if lambda{ puts "true" }, lambda{ puts "false"}

And that’s the real Lisp inheritage of Ruby. There really isn’t any essential syntax. Everything can be implemented with the basics of receiver, message, arguments, and blocks. Just remember that. It’s the basis for all useful metaprogramming. There is no compile-time. Everything can change. “There is no spoon”.



The Ruby singleton class


After my post on Meta-programming techniques I got a few comments and questions about the singleton-class. This feature seem to be quite hard to understand so I have decided that I will try to clarify the issue by first describing what it is, and then detail why it is so useful. This entry will be concept-heavy and code-light.

What it is
A child with many names, the singleton class has been called metaclass, shadow class, and other similar names. I will stay with singleton class, since that’s the term the Pickaxe uses for it.

Now, in Ruby, all objects have a class that it is an instance of. You can find this class by calling the method class on any object. The methods an object respond to will originally be the ones in that objects class. But as probably know, Ruby allows you to add new methods to any object. There are two syntaxes to do this:

 class << foo
def bar
puts "hello world"
end
end

and

 def foo.bar
puts "hello, world"
end

To the Ruby interpreter, there is no difference in this case. Now, if foo is a String, the method bar will be available to call on the object referenced by foo, but not on any other Strings. The way this works is that the first time a method on a specific object is defined, a new, anonymous class will be inserted between the object and the real class. So, when I try to call a method on foo, the interpreter will first search inside the anonymous class for a definition, and then go on searching the real class hierarchy for an implementation. As you probably understand, that anonymous class is our singleton class.

The other part of the mystery about singleton classes (and which is the real nifty part) is this. Remember, all objects can have a singleton class. And classes are objects in themselves. Actually, a class such as String is actually an instance of the class Class. There is nothing special about these instances, actually. They have capitalized names, but that’s because the names are constants. And, since every class in Ruby is an instance of the class Class, that means that what’s called class methods, or static methods if you come from Java, is actually just singleton methods defined on the instance of the class in question. So, say you would add a new class method to String:

 def String.hello
puts "hello"
end

String.hello

And now you see that the syntax is actually the same as when we add a new singleton method to any other object. This only difference here is that that object happens to be an instance of Class. There are two other common ways to define class methods, but they work the same way:

 class String
def self.hello
puts "hello"
end
end

class String
class << self
def hello
puts "hello"
end
end
end

Especially the second version needs explaining, for two reasons. First, this is the preferred idiom in Ruby, and it also makes explicit the singleton class. What happens is that, since the code inside the “class String”-declaration is executed in the scope of the String instance of Class, we can get at the singleton class with the same syntax we used to define foo.bar earlier. So, the definition of hello will happen inside the singleton class for String. This also explain the common idiom for getting the singleton class:

 class << self; self; end

There is no other good way to get it, so we extract the self from inside a singleton class definition.

Why is it so useful for metaprogramming?
Obviously, you can define class methods with it, but that’s not the main benefit. You can do many metaprogramming tricks with it, that are impossible without. The first one is to create a super class that can define new class methods on sub classes of itself. That is the use I show cased in my earlier blog entry. The problem is that you can’t just use self by itself, since that only gives the class instance. This code with results show the difference:

 class String
p self
end # => String

class String
p (class << self; self; end)
end # => #<Class:String>

And, if you want to use define_method, module_eval and all the other tricks, you need to invoke them on the singleton-class, not the regular self. Basically, if you need to dynamically define class methods, you need the singleton-class. This example will show the difference between defining a dynamic method with self or the singleton class:

 class String
self.module_eval do
define_method :foo do
puts "inside foo"
end
end

(class << self; self; end).module_eval do
define_method :bar do
puts "inside bar"
end
end
end

"string".foo # => "inside foo"
String.bar # => "inside bar"

As you can see, the singleton class will define the method on the class instead. Of course, if you know the class name it will always be easier to avoid having an explicit singleton class, but when the method needs to defined dynamically you need it. It’s as simple as that.



The Dark Ages of programming languages


We seem to be living in the dark ages of programming languages. I’m not saying this to bash everything; I’m actually being totally objective right now. Obviously, our situation right now is much better than it was 10 years ago. Or even 5 years ago. I would actually say that it’s really much better now, than 1 year ago. But programming is still way too painful in almost all cases. We are doing so much stuff by hand that obviously should be done be computer.

I spend quite much time learning new languages now and then, to try to find something that’s really good for me. So far, the best contestants are Ruby, Erlang, OCaml and Lisp, but all of those have their share of problems too. They just suck less than the alternatives.

  • Ruby… I really like Ruby. Ruby is such an improvement that I really want to do almost everything in it nowadays. I think in Ruby half the time and in Lisp the other half. But it’s not enough. It is still clunky. I want tail calls. I want real macros. I want blazing speed and complete integration with good libraries for everything and more. I’m just a sucker for power, and I want more of it in Ruby.
  • Erlang and OCaml. These languages are really great. For specific applications. Specifically, Erlang is totally superior for concurrent programming. And OCaml is incredibly fast, very typesafe and has great GUI libraries. So, if I was asked to do something massively concurrent I would probably choose Erlang, and OCaml if it was GUI programming. But otherwise… Well, Erlang does have some neat functional properties, but not any nice macro support. It doesn’t have a central code repository and many other things you expect from a general purpose language. OCaml suffers from the same things.
  • Lisp is the love of my life. But as so many people before me has noted, all the implementations are bad in some way or another. Scheme is lovely; for research. Common Lisp is so powerful, but it needs users. Lots of them, creating libraries for every little data format there can be, creating competing implementations of particularly important API’s; like databases.

Conclusion. Nothing is good enough, right now. I see two two paths ahead. Two ways that could actually end in the “100-year language”.

The first path is one new language. This language will be based on all the best features of all current languages, plus a good amount of research output. I have a small list what this language would need to be successful as the next big one:

  • It needs to be multiparadigm. I’m not saying it can’t choose one paradigm as the base, but it should be possible to program in it functionally, OOP, AOP, imperative. It should be possible to build a declarative library so you can do logic programming without leaving the language.
  • It should have static type inference where possible. It should also allow optional type hints. This is so important for creating great implementations. It can also increase readability in some cases.
  • It needs all the trappings of functional languages; closures, first-order functions and lambdas. This is essential, to avoid locking the language into an evolutionary corner.
  • It needs garbage collection. Possibly several competing implementations of GC’s, running evolutionary algorithms to find out which one is best suited for long running processes of the program in question.
  • A JIT VM. It seems almost a given right now that Virtual Machines are a big win. They can also be made incredibly fast.
  • Another JIT VM.
  • A non-VM implementation. Several competing implementations for different purposes is important to allow competition and experimentation with new features of implementation.
  • Great integration with legacy languages (Java, Ruby (note, I’m counting on all Rubyists moving to this new language when it gets out, making Ruby legacy), Cobol). This is obvious. There are to many things lying around, bitrotting, that we will never get rid of.
  • The language and at least one production quality implementation needs to be totally open-source. No lock-in of the language should be possible.
  • Likewise, good company support is essential. A language needs money to be developed.
  • A centralized code/library repository. This is one of Java’s biggest failings. Installing a new library in Java is painful. We need something like CPAN, ASDF, RubyGems.
  • The language needs great, small and very orthogonal libraries. The libraries included with the language needs to be great, since they have to be small but still pack all the most needed punch.
  • Concurrency must be a breeze. There should be facilities in the language itself for making this obvious. (Like Erlang or Gambit Scheme).
  • It should be natural to do meta-programming in it (in the manner of Ruby).
  • It should be natural to solve problems bottom-up, by implementing DSL’s inside or outside the language.
  • The languages needs a powerful macro facility that isn’t to hard to use.
  • Importantly, for the macro facility, the language needs to have a well-defined syntax tree of the simplest possible kind, but it also needs to have optional syntax.

So, that’s what I deem necessary (but maybe not sufficient) for a really useful, good, long term programming language. When I read this list, it doesn’t seem that probables that this language will show up any time soon, though. Actually, it seems kinda unrealistic.

So maybe the other way ahead is the right one? The other way I envision is that languages become easier and easier to create, and languages have their strength in different places. Along this path I envision the descendants of Ruby and Erlang exploiting what they’re good at and eschewing everything else. But for this strategy to work, the first thing implemented in each language needs to be a seamless way to integrate to other languages. Maybe there will come an extremely good glue-language (not like Perl or Ruby, but a language that only will serve as glue between programming languages), and all languages will implement good support for that language. For example you could code a base Erlang concurrent framework, which uses G (the glue language) to implement some enterprise functionality in Java sandboxes, and some places where Ruby through G will implement a DSL, which have subparts where Ruby uses G to run Prolog knowledge engines.

If you had to choose among the two futures, I am frankly more inclined towards the one-language one. But the multi-language way seems much more probable. And since I’m trying to choose way now, I’m placing my bets on the second option. We are not ready to implement G yet, but I do think that as many p-language techs as possible should do their best to learn how languages can cooperate in different ways, to prepare this project.



Ruby Metaprogramming techniques


Updated: Scott Labounty wondered how the trace example could work and since a typical metaprogramming technique is writing before- and after-methods, I have added a small version of this.
Updated: Fixed two typos, found by Stephen Viles

I have been thinking much about Metaprogramming lately. I have come to the conclusion that I would like to see more examples and explanations of these techniques. For good or bad, metaprogramming has entered the Ruby community as the standard way of accomplishing various tasks, and to compress code. Since I couldn’t find any good resources of this kind, I will start the ball running by writing about some common Ruby techniques. These tips are probably most useful for programmers that come to Ruby from another language or haven’t experienced the joy of Ruby Metaprogramming yet.

1. Use the singleton-class

Many ways of manipulating single objects are based on manipulations on the singleton class and having this available will make metaprogramming easier. The classic way to get at the singleton class is to execute something like this:

 sclass = (class << self; self; end)

RCR231 proposes the method Kernel#singleton_class with this definition:

 module Kernel
  def singleton_class
    class << self; self; end
    end
end

I will use this method in some of the next tips.

2. Write DSL’s using class-methods that rewrite subclasses

When you want to create a DSL for defining information about classes, the most common trouble is how to represent the information so that other parts of the framework can use them. Take this example where I define an ActiveRecord model object:

 class Product < ActiveRecord::Base
  set_table_name 'produce'
 end

In this case, the interesting call is set_table_name. How does that work? Well, there is a small amount of magic involved. One way to do it would be like this:

module ActiveRecord
  class Base
    def self.set_table_name name
      define_attr_method :table_name, name
    end

    def self.define_attr_method(name, value)
      singleton_class.send :alias_method, "original_#{name}", name
      singleton_class.class_eval do 
        define_method(name) do   
          value 
        end
      end
    end
  end
end

What’s interesting here is the define_attr_method. In this case we need to get at the singleton-class for the Product class, but we do not want to modify ActiveRecord::Base. By using singleton_class we can achieve this. We have to use send to alias the original method since alias_method is private. Then we just define a new accessor which returns the value. If ActiveRecord wants the table name for a specific class, it can just call the accessor on the class. This way of dynamically creating methods and accessors on the singleton-class is very common, and especially so in Rails.

3. Create classes and modules dynamically

Ruby allows you to create and modify classes and modules dynamically. You can do almost anything you would like on any class or module that isn’t frozen. This is very useful in certain places. The Struct class is probably the best example, where

PersonVO = Struct.new(:name, :phone, :email)
p1 = PersonVO.new(:name => "Ola Bini")

will create a new class, assign this to the name PersonVO and then go ahead and create an instance of this class. Creating a new class from scratch and defining a new method on it is as simple as this:

c = Class.new
c.class_eval do
  define_method :foo do
    puts "Hello World"
  end
end

c.new.foo    # => "Hello World"

Apart from Struct, examples of creating classes on the fly can be found in SOAP4R and Camping. Camping is especially interesting, since it has methods that creates these classes, and you are supposed to inherit your controllers and views from these classes. Much of the interesting functionality in Camping is actually achieved in this way. From the unabridged version:

def R(*urls); Class.new(R) { meta_def(:urls) { urls } }; end

This makes it possible for you to create controllers like this:

class View < R '/view/(\d+)'
  def get post_id
  end
end

You can also create modules in this way, and include them in classes dynamically.

4. Use method_missing to do interesting things

Apart from blocks, method_missing is probably the most powerful feature of Ruby. It’s also one that is easy to abuse. Much code can be extremely simplified by good use of method_missing. Some things can be done that aren’t even possible without. A good example (also from Camping), is an extension to Hash:

class Hash
  def method_missing(m,*a)
    if m.to_s =~ /=$/  
      self[$`] = a[0]
    elsif a.empty?  
      self[m]
    else  
      raise NoMethodError, "#{m}"
    end
  end
end

This code makes it possible to use a hash like this:

x = {'abc' => 123}
x.abc # => 123
x.foo = :baz
x # => {'abc' => 123, 'foo' => :baz}

As you see, if someone calls a method that doesn’t exist on hash, it will be searched for in the internal collection. If the method name ends with an =, a value will be set with the key of the method name excluding the equal sign.

Another nice method_missing technique can be found in Markaby. The code I’m referring to makes it possible to emit any XHTML tags possible, with CSS classes added into it. This code:

body do
  h1.header 'Blog'
  div.content do
    'Hellu'
  end
end

will emit this XML:

  <body><h1 class="header">Blog</h1><div class="content">Hellu</div></body>

Most of this functionality, especially the CSS class names is created by having a method_missing that sets attributes on self, then returning self again.

5. Dispatch on method-patterns

This is an easy way to achieve extensibility in ways you can’t anticipate. For example, I recently created a small framework for validation. The central Validator class will find all methods in self that begin with check_ and call this method, making it very easy to add new checks: just add a new method to the class, or to one instance.

methods.grep /^check_/ do |m|
  self.send m
end

This is really easy, and incredibly powerful. Just look at Test::Unit which uses this method all over the place.

6. Replacing methods

Sometimes a method implementation just doesn’t do what you want. Or maybe it only does half of it. The standard Object Oriented Way ™ is to subclass and override, and then call super. This only works if you have control over the object instantiation for the class in question. This is often not the case, and then subclassing is worthless. To achieve the same functionality, alias the old method and add a new method-definition that calls the old method. Make sure that the previous methods pre- and postconditions are preserved.

class String
  alias_method :original_reverse, :reverse

  def reverse 
    puts "reversing, please wait..." 
    original_reverse
  end
end

Also, a twist on this technique is to temporarily alias a method, then returning it to before. For example, you could do something like this:

def trace(*mths)
  add_tracing(*mths) # aliases the methods named, adding tracing  
  yield
  remove_tracing(*mths) # removes the tracing aliases
end

This example shows a typical way one could code the add_tracing and remove_tracing methods. It depends on singleton_class being available, as per tip #1:

class Object  
  def add_tracing(*mths)
    mths.each do |m|
      singleton_class.send :alias_method, "traced_#{m}", m      
      singleton_class.send :define_method, m do |*args|        
        $stderr.puts "before #{m}(#{args.inspect})"        
        ret = self.send("traced_#{m}", *args)        
        $stderr.puts "after #{m} - #{ret.inspect}"        
        ret      
      end    
    end  
  end

  def remove_tracing(*mths)    
    mths.each do |m|      
      singleton_class.send :alias_method, m, "traced_#{m}"    
    end  
  end
end

"abc".add_tracing :reverse

If these methods were added to Module (with a slightly different implementation; see if you can get it working!), you could also add and remove tracing on classes instead of instances.

7. Use NilClass to implement the Introduce Null Object refactoring

In Fowlers Refactorings, the refactoring called Introduce Null Object is for situations where an object could either contain an object, or null, and if it’s null it will have a predefined value. A typical exampel would be this:

name = x.nil? ? "default name" : x.name

Now, the refactoring is based on Java, which is why it recommends to create a subclass of the object in question, that gets set when it should have been null. For example, a NullPerson object will inherit Person, and override name to always return the “default name” string. But, in Ruby we have open classes, which means you can do this:

def nil.name; "default name"; end
x # => nil
name = x.name # => "default name"

8. Learn the different versions of eval

There are several versions of evaluation primitives in Ruby, and it’s important to know the difference between them, and when to use which. The available contestants are eval, instance_eval, module_eval and class_eval. First, class_eval is an alias for module_eval. Second, there’s some differences between eval and the others. Most important, eval only takes a string to evaluate, while the other can evaluate a block instead. That means that eval should be your absolutely last way to do anything. It has it’s uses but mostly you can get away with just evaluating blocks with instance_eval and module_eval.

Eval will evaluate the string in the current environment, or, if a binding is provided in that environment. (See tip #11).

Instance_eval will evaluate the string or the block in the context of the reveiver. Specifically, this means that self will be set to the receiver while evaluating.

Module_eval will evaluate the string or the block in the context of the module it is called on. This sees much use for defining new methods on modules or singleton classes. The main difference between instance_eval and module_eval lies in where the methods defined will be put. If you use String.instance_eval and do a def foo inside, this will be available as String.foo, but if you do the same thing with module_eval you’ll get String.new.foo instead.

Module_eval is almost always what you want. Avoid eval like the plague. Follow these simple rules and you’ll be OK.

9. Introspect on instance variables

A trick that Rails uses to make instance variables from the controller available in the view is to introspect on an objects instance variables. This is a grave violation of encapsulation, of course, but can be really handy sometimes. It’s easy to do with instance_variables, instance_variable_get and instance_variable_set. To copy all instance_variables from one object to another, you could do it like this:

from.instance_variables.each do |v|
  to.instance_variable_set v, from.instance_variable_get(v)
end

10. Create Procs from blocks and send them around

Materializing a Proc and saving this in variables and sending it around makes many API’s very easy to use. This is one of the ways Markaby uses to manage those CSS class definitions. As the pick-axe details, it’s easy to turn a block into a Proc:

def create_proc(&p); p; end
create_proc do
  puts "hello"
end       # => #<Proc ...>

Calling it is as easy:

p.call(*args)

If you want to use the proc for defining methods, you should use lambda to create it, so return and break will behave the way you expect:

p = lambda { puts "hoho"; return 1 }
define_method(:a, &p)

Remember that method_missing will provide a block if one is given:

def method_missing(name, *args, &block)
  block.call(*args) if block_given?
end

thismethoddoesntexist("abc","cde") do |*args|
  p args
end  # => ["abc","cde"]

11. Use binding to control your evaluations

If you do feel the need to really use eval, you should know that you can control what variables are available when doing this. Use the Kernel-method binding to get the Binding-object at the current point. An example:

def get_b; binding; end
foo = 13
eval("puts foo",get_b) # => NameError: undefined local variable or method `foo' for main:Object

This technique is used in ERb and Rails, among others, to set which instance variables are available. As an example:

class Holder
  def get_b; binding; end
end

h = Holder.new
h.instance_variable_set "@foo", 25
eval("@foo",h.get_b)

Hopefully, some of these tips and techniques have clarified metaprogramming for you. I don’t claim to be an expert on either Ruby or Metaprogramming. These are just my humble thoughts on the matter.