RSpec matchers and regexp comments – a possibly useful hack


A few days back, I was sitting at my client and working on a hacky spec to validate some assumptions in a very dirty data set. I wanted to figure out some limits. The basic idea was that I needed to go through an entry for every day the last 8 years. Getting these entries is potentially expensive, and the validation was based on checking that a specific value never turns up. This was quite easy, and I ended up with something like this:

require 'spec_helper'

describe "Data invariant" do
  it "holds" do
    (8*365).times do |n|
      date = n.days.ago
      calculate_token_at(date).should_not == "MAGIC TOKEN"
    end
  end
end

Here you can see that I just simply use a method to calculate the invariant, then use the “should_not ==” to find out if it’s true. Nothing fancy. The problem comes when I want to get information about a failure. Now, I could insert a print statement. That means I’d have to look at all the output until I get to the end, to see which one failed. I could also rescue all exceptions, print the offending information and then reraise. But the best solution would be to give RSpec a failure message. Now, you can definitely do this for RSpec in other matchers, but I couldn’t find a way of doing it with the == matcher. One thing I could have done, was to just write my own matcher.That also seemed inefficient. This was a throw away thing, run once and then delete.

What I ended up doing was actually quite elegant, in a very disgusting way. It works, and it might be useful for someone else, sometime. But don’t EVER do anything like this in code you will save.

require 'spec_helper'

describe "Data invariant" do
  it "holds" do
    (8*365).times do |n|
      date = n.days.ago
      calculate_token_at(date).should_not =~ /\AMAGIC TOKEN(?#Invariant failed on: #{date})\Z/
    end
  end
end

So why does this work? Well, it turns out that you can have comments in regular expressions. And you can interpolate arbitrary values into regexps, just like with strings. So I can embed the failure information in a comment in the regexp. This will only be displayed when the match fails, since RSpec by default says something like “expected MAGIC TOKEN to not match /\AMAGIC TOKEN(?#Invariant failed on: 2010-06-04)\Z/”, so you get the information necessary. The comment does not contribute to the matching in anyway. There’s another subtle point here. I haven’t used ^ and $ for anchoring the pattern. Instead I use \A and \Z. The reason is that otherwise, my regexp wouldn’t have the same behavior as comparing against a string, since ^ and $ match the beginning and end of lines too, not only the beginning and end of buffer.

Anyway, I thought I’d share this. In basically all cases, don’t do this. But it’s still a bit funny.