Announcing JesCov – JavaScript code coverage


It seems the JavaScript tool space is not completely saturated yet. As I mentioned in my previous post I’ve had particular trouble finding a good solution to code coverage. So I decided to build my own version of it. The specific feature to notice is transparent translation of source code and support for branch coverage. It also has some limitations at the moment, of course. This is release 0.0.1 and as such is definitely a first release. If you happen to use the Jasmine JUnit runner it should be possible to drop in this directly and have something working immediately.

You can find information, examples and downloads here: http://jescov.olabini.com



JavaScript in the small


My most recent project was on a fairly typical Java Web project where we had a component that should be written in JavaScript. Nothing fancy, and nothing big. It does seem like people are still not taking JavaScript seriously in these kind of environments. So I wanted to take a few minutes and talk about how we developed JavaScript on this project. The kind of advice I’ll be giving here is well suited for web projects with small to medium amounts of JavaScript. If you’re writing large parts of your application on the client side, you probably want to go with a full stack framework to help you out, so these things are less relevant.

Of course, most if not all things I’ll cover here can be gleaned from other sources, and probably better. And if you’re an experienced JavaScript developer, you are probably fine without this article.

I had to do two things to get efficient in using JavaScript. The first one was to learn to ignore the syntax. The syntax is clunky and definitely gets in the way. But with the right habits (such as having a shortcut for function/lambda literals, and making sure to always put the returned value on the same line as the return statement) I’ve been able to see through the syntax and basically use JavaScript in a Scheme-like style. The second thing is to completely ignore the object system. I use a lot of object literals, but not really any constructors or the this-keyword. Both of these features can be used well, but they are also very clunky, and hard to get everyone on a team to understand the same way. I love prototype based OO as a model, and I’ve used it with success in Ioke and Seph. But with JavaScript I generally shy away from it.

The module pattern

The basic idea of the module pattern is that you encapsulate all your code in anonymous functions that are then immediately evaluated to generate the actual top level object. Since JavaScript has some unfortunate problems with global variables (like, they are there), it’s safest to just put all your code inside of one or more of these modules. You can also make your modules take the dependencies you want to use. A simple module might look like this:

var olaBiniSeriousBanking = (function() {
  var balance = 0;

  function deposit(num) {
    balance += num;
  }

  function checkOverdraft(amount) {
    if(balance - amount < 0) {
      throw "Can't withdraw more than exists in account";
    }
  }

  function withdraw(amount) {
    checkOverdraft(amount);
    balance -= amount;
  }

  return {deposit: deposit, withdraw: withdraw};
})();
In this case the balance variable is completely hidden inside a lexical closure, and can only be accessed by the deposit and withdraw functions. These functions are also not in the global namespace so there is no risk for clobbering. It’s also possible to have lots and lots of helper functions that no one else can see. That makes it easier to make your functions smaller – and incidentally, the largest problem I’ve seen with JavaScript code quality is that functions tend to be very large. Don’t do that!
A useful variation of the module pattern is to extract the construction function and give it a name. Even though you might use it immediately, it makes it possible to create more than one of these, use different dependencies, or make it accessible from tests so you can inject collaborators:

var olaBiniGreeterModule = (function(greeting) {
  return {greet: function(name) {
    console.log(greeting + ", " + name);
  }};
});
var olaBiniGreeterEng = olaBiniGreeterModule("Hello");
var olaBiniGreeterSwe = olaBiniGreeterModule("Hejsan");

RequireJS

The module pattern is good on its own, but there are some things that can be done by a loader that makes things even better. There are several variations of these module loaders, but my favorite so far is RequireJS. I have several reasons for this, but the main one is probably that it is very light weight, and is actually a net win even for very small web applications. There are lots of benefits with letting RequireJS handle your modules. The main ones is that it takes care of dependencies between modules, and loads them automatically. This means you can define one single entry point for your JavaScript, and RequireJS makes sure to load everything else. Another good aspect of RequireJS is that it allows you to avoid any global names at all. Everything is handled by callbacks inside of RequireJS. So how does it look? Well, a simple module with a dependency can look like this:

// in file foo.js
require(["bar", "quux"], function(bar, quux) {
  return {doSomething: function() { 
    return bar.something() + quux.something();
  }};
});
If you have something else that uses foo, then this file will be loaded, bar.js and quux.js will be loaded and the results of loading them (the return value from the module function) will be sent in as arguments to the function that creates the foo module. So RequireJS takes care of all this loading. But how do you kick it off? Well, you should have one single script tag in your HTML, that will point to require.js. You will also add an extra attribute to this script tag that points to the entry point to the JavaScript:

<script data-main="scripts/main" src="scripts/require.js"> </script>
This will do a number of things. It will load require.js. It will set the scripts directory as the base for all module references in your JavaScript. And it will load scripts/main.js as if it’s a RequireJS module. And if you want to use our foo-module earlier, you can create a main.js that looks like this:

// in file main.js
require(["foo"], function(foo) {
  require.ready(function() {
    console.log(foo.doSomething());
  });
});
This will make sure that foo.js and its dependencies bar.js and quux.js will be loaded before the function is invoked. However, one aspect of JavaScript that people sometimes gets wrong is that you have to wait until the DOM is ready to execute JavaScript. With RequireJS we use the ready function inside the require object to make sure we can do something when everything is ready. Your main module should always wait with doing something until the document is ready.
In general, RequireJS has helped a lot with structure and dependencies and it makes it very simple to break up JavaScript into much smaller pieces. I like it a lot. There are a few downsides, though. Main is that it doesn’t interact well with server side JavaScript (or at least it didn’t when I read up on it a month ago). Also, it doesn’t provide a clean way of getting access to the module functions without executing them, which becomes annoying when testing these things. I’ll talk a bit more about that in the section on testing.

No JavaScript in HTML

I don’t want any JavaScript whatsoever in the HTML, if I can avoid it. The only script tag should be the one that starts your module and loading framework – in my case RequireJS. We don’t have any event handlers embedded in the pages at all. We started out from a place where some of our pages had lots of event handlers and refactored to a much smaller code base that was much easier to work with by extracting all of these things into separate JavaScript modules. This has a side effect that anything you want to work with should be possible to semantically identify, either by using CSS classes or data attributes. Try to avoid convoluted paths to find elements. It’s OK to add some extra classes and attributes to make your JavaScript clean and simple.

Init functions on ready

In terms of how we structure modules in a real application, we don’t actually do much work on startup. Instead, most of the work involves setting up event handlers and so on. The way we are doing that is to have the top level modules expose an init method, that is expected to be called by the main module when it starts up. Imagine in a system where you have dojo as the main framework, and you have this code:

// foo.js
require(["bar"], function(bar) {
  function sayHello(node) {
    console.log("hello " + node);
  }

  function attachEventHandlers(dom) {
    dom.query(".fluxCapacitors").onclick(sayHello);
  }

  function init(dom) {
    bar.init(dom);
    attachEventHandlers(dom);
  }

  return {init: init};
});

// main.js
require(["foo"], function(foo) {
  require.ready(function() {
    foo.init(dojo);
  });
});
This will make sure to set up all event handlers and put the application in the right state to be used.

Lots of callbacks

Once you’ve taught yourself to ignore the verbosity of anonymous lambdas in JavaScript, they become very handy tools for creating APIs and helper functions. In general, the code we write use a lot of callbacks and helper wrapper functions. I also use functions that generate new functions quite liberally, doing things like currying and similar aspects. A fairly typical example is something like this:

function checkForChangesOn(node) {
  return function() {
    if(dojo.query(node).length() > 42) {
      console.log("Warning, flux reactor in flax");
    }
  };
}

dojo.query(".clixies").onclick(checkForChangesOn(".fluxes"));
dojo.query(".moxies").onclick(checkForChangesOn(".flexes"));
This kind of abstraction can lead to very readable and clean JavaScript if done well. It can also lead to code where very piece is as small as it can be. In fact, one of the ways we use to make the syntax a little bit more bearable is to extract creation of anonymous functions into factory functions like this.

Lots of anonymous objects

Anonymous objects are great for many things. They work as a substitute for named arguments, and can be very useful to return more than one value. In our code base we use anonymous objects a lot, and it definitely helps with code readability.

Testing

We use Jasmine for unit testing our JavaScript. This works quite well in general. Since this is a fairly typical Java web application we wanted to run it as part of our regular build process. This means we ended up using the JUnit Jasmine runner, which allow us to run these tests outside of browsers and format the results using all the available JUnit tools. Since we’ve tried to make the scripts as modular and small as possible, and also extracting most of the DOM behavior, we have avoided using HTML fixtures. This means our tests are leaning more towards traditional unit tests, rather than BDD style tests – which I’m not sure I’m comfortable with. But with the current size of the application, this is not really a problem.
Seeing as we wanted to test each module in isolation, we wanted to be able to instantiate the RequireJS module with our custom mock dependencies. This ended up not being very easy with RequireJS, so instead of trying to fit in to that model, we just don’t load RequireJS at all during testing, but instead have a top-level require function that just saves away the module function with a well defined name. This means we can instantiate the modules as many times as we want and inject different mocks for different purposes.
In general, Jasmine works well for us, but there are some features missing from the mocking/stubbing framework that makes certain things a bit complicated. One thing I miss a lot is the capability of having stubs returning different valueus depending on the arguments sent in. Some ugly code has been written to get around this.

Open questions

Our current JavaScript process works well for us, but there are still some open things we haven’t done yet. First among these is to integrate JSLint into our build process. I really think that should be there, so I have no excuse. We don’t have tests running inside of browsers. I’m actually OK with this, since we’re trying to do more unit level coverage with Jasmine. Hopefully our acceptance tests cover some of the browser based testing. We are not doing minification at all, and we probably won’t need it based on the current expected usage. For a different audience we would certainly minify everything – this is something RequireJS can do really well though. We don’t have any coverage tool running on our JavaScript either. This is something I’m also uncomfortable with, but I haven’t really found a good tool that allows us to run coverage as part of our CI process yet. I also care more about branch coverage than line coverage, and no tool seems to give you this at the moment.

Summary

JavaScript can be completely OK to work with, provided you treat it as a real language. It’s quite powerful, but we also have a lot of bad habits based on hacking together small things, or just doing what works. As we go forward with JavaScript, this needs to stop. But the good news is that if you’re a decent developer, you shouldn’t have any problem picking anything of this up.


Injecting loggers using Spring


On my current project we are using Spring MVC and we try to use autowiring as much as possible. I personally strongly prefer constructor injection, since this gives me the luxury of working with final fields. I also like being able to inject all things a class needs – including loggers. Most of the time I don’t really want to use custom loggers from tests, but sometimes I do want to make sure something gets logged correctly, and being able to inject a logger seems like a natural way of doing that. So, with that preamble out of the way, my problem was that this seemed quite hard to achieve in Spring. Specifically, I use SLF4J, and I want to inject the equivalent of doing LoggerFactory.getLogger(MyBusinessObject.class). Sadly, Spring doesn’t give access to the place where something is going to be injected in any of the hooks available. Most solutions I found to this problem relies on using a BeanPostProcessor to set a field on the object after it’s been created. This defeats three of my purposes/principles – I can’t use the logger in the constructor, the field will be mutable and I won’t get told by Spring if I’ve made a mistake in my wiring.

There was however one solution I found in a StackOverflow post – sadly it wasn’t complete. Specifically, I needed to use it in a Spring MVC setting and also from inside of tests. So this blog post is mainly to provide the complete solution for something like this. It’s a simple problem, but it was surprisingly tricky to get working correctly. But now that I have it, it will be very convenient. This code is for Spring 3.1, and I haven’t tested it on anything else.

The first part of this injection is to create our own custom BeanFactory – which is what Spring uses internally to manage beans and dependencies. The default one is called DefaultListableBeanFactory and we will just subclass it like this:

public class LoggerInjectingListableBeanFactory 
                extends DefaultListableBeanFactory {
    public LoggerInjectingListableBeanFactory() {
        setParameterNameDiscoverer(
            new LocalVariableTableParameterNameDiscoverer());
        setAutowireCandidateResolver(
           new QualifierAnnotationAutowireCandidateResolver());
    }

    public LoggerInjectingListableBeanFactory(
              BeanFactory parentBeanFactory) {
        super(parentBeanFactory);
        setParameterNameDiscoverer(
            new LocalVariableTableParameterNameDiscoverer());
        setAutowireCandidateResolver(
            new QualifierAnnotationAutowireCandidateResolver());
    }

    @Override
    public Object resolveDependency(
               DependencyDescriptor descriptor, String beanName, 
              Set<String> autowiredBeanNames, TypeConverter typeConverter) 
                     throws BeansException {
        Class<?> declaringClass = null;

        if(descriptor.getMethodParameter() != null) {
            declaringClass = descriptor.getMethodParameter()
                    .getDeclaringClass();
        } else if(descriptor.getField() != null) {
            declaringClass = descriptor.getField()
                    .getDeclaringClass();
        }

        if(Logger.class.isAssignableFrom(descriptor.getDependencyType())) {
            return LoggerFactory.getLogger(declaringClass);
        } else {
            return super.resolveDependency(descriptor, beanName, 
                    autowiredBeanNames, typeConverter);
        }
    }
}

The magic happens inside of resolveDependency where we can figure out the declaring class by checking either the method parameter or the field – and then see whether the thing asked for is a Logger. Otherwise we just delegate to the super implementation.

In order to use this from anything we need an actual ApplicationContext that uses it. I didn’t find any hook to set the BeanFactory after the application context was created, so I ended up creating two new ApplicationContext implementations – one for tests and one for the Spring MVC purpose. They are slightly different, but try to do so little as possible while retaining the behavior of the original. The application context for the tests look like this:

public class LoggerInjectingGenericApplicationContext 
                    extends GenericApplicationContext {
    public LoggerInjectingGenericApplicationContext() {
        super(new LoggerInjectingListableBeanFactory());
    }
}

This one just calls the super constructor with an instance of our custom bean factory. The application context for Spring MVC looks like this:

public class LoggerInjectingXmlWebApplicationContext 
                    extends XmlWebApplicationContext {
    @Override
    protected DefaultListableBeanFactory createBeanFactory() {
        return new LoggerInjectingListableBeanFactory(
                    getInternalParentBeanFactory());
    }
}

The XmlWebApplicationContext doesn’t have a constructor that takes a bean factory, so instead we override the createBeanFactory method to return our custom instance. In order to actually use these implementations some more things are needed. In order to get our tests to use it, a test.context.support.ContextLoader implementation is necessary. This code is mostly just copied from the default implementation – sadly it doesn’t provide any extension points and the place I want to override are in the middle of two final methods. It feels quite ugly to just copy the implementations, but there are no hooks for this…

public class LoggerInjectingApplicationContextLoader 
                        extends AbstractContextLoader {
    public final ApplicationContext loadContext(
     MergedContextConfiguration mergedContextConfiguration) 
                                  throws Exception {
        String[] locations = mergedContextConfiguration.getLocations();
        GenericApplicationContext context = 
                  new LoggerInjectingGenericApplicationContext();
        context.getEnvironment().setActiveProfiles(
               mergedContextConfiguration.getActiveProfiles());
        loadBeanDefinitions(context, locations);
        AnnotationConfigUtils.registerAnnotationConfigProcessors(context);
        context.refresh();
        context.registerShutdownHook();
        return context;
    }

    public final ConfigurableApplicationContext 
            loadContext(String... locations) throws Exception {
        GenericApplicationContext context = 
              new LoggerInjectingGenericApplicationContext();
        loadBeanDefinitions(context, locations);
        AnnotationConfigUtils.registerAnnotationConfigProcessors(context);
        context.refresh();
        context.registerShutdownHook();
        return context;
    }

    protected void loadBeanDefinitions(
            GenericApplicationContext context, String... locations) {
        createBeanDefinitionReader(context).
               loadBeanDefinitions(locations);
    }

    protected BeanDefinitionReader createBeanDefinitionReader(
                      final GenericApplicationContext context) {
        return new XmlBeanDefinitionReader(context);
    }

    @Override
    public String getResourceSuffix() {
        return "-context.xml";
    }
}

The final thing necessary to get your tests to use the custom Bean Factory is to specify the loader to use in the ContextConfiguration on your test class, like this:

@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(value = "file:our-app-config.xml",
          loader = LoggerInjectingApplicationContextLoader.class)
public class SomeTest {
}

In order to get Spring MVC to pick this up, you can edit your web.xml and add a new init-param for the DispatcherServlet, like this:

    <servlet>
        <servlet-name>Spring MVC Dispatcher Servlet</servlet-name>
        <servlet-class>
           org.springframework.web.servlet.DispatcherServlet
        </servlet-class>
        <init-param>
            <param-name>contextConfigLocation</param-name>
            <param-value>WEB-INF/our-app-config.xml</param-value>
        </init-param>
        <init-param>
            <param-name>contextClass</param-name>
            <param-value>
               com.example.LoggerInjectingXmlWebApplicationContext
            </param-value>
        </init-param>
        <load-on-startup>1</load-on-startup>
    </servlet>

This approach seems to work well enough. Some of the code is slightly ugly and I would definitely love to have a better hook for injection points to know where it will get injected. Having factory methods be able to take the receiver object might be very convenient, for example. Being able to customize the bean factory seems like it also should be much easier than this.



Seph – A Hard Language to Compile


I have recently started work on Seph again. I preannounced it last summer (here), then promply became extremely busy at work. Busy enough that I didn’t really have any energy to work on this project for a while. Sadly, I’m still as busy, but I’ve still managed to find some small slivers of time to start working on the compiler parts of the implementation. This has been made much easier and more fun since JSR292 is getting near to completion, and an ASM 4 branch is available that makes it easier to compile Java bytecode with support for invoke dynamic built in.

So that means that the current code in the repository actually goes a fair bit to where I want it to be. Specifically, the compiler compiles most code except for abstractions that create abstractions, and calls that take keyword arguments. Assignments is not supported either right now. I don’t expect any of these features to be very tricky to implement, so I’m waiting with that and working on other more complicated things.

This blog post is meant to serve two purposes. The first one is to just tell the world that Seph as an idea and project actually is alive and being worked on – and what progress has been made. The other aspect of this post is to talk about some of the things that make Seph a quite tricky language to compile. I will also include some thoughts I have on how to solve these problems – and suggestions are very welcome if you know of a better approach.

To recap, the constraints Seph is working under is that it has to run on Java 7. It has to be fully compiled (in fact, I haven’t decided if I’ll keep the interpreter at all after the compiler is working). And it has to be fast. Ish. I’m aiming for Ruby 1.8-speed at least. I don’t think that’s unreasonable, considering the dimensions of flexibility Seph will have to allow.

So let’s dive in. These are the major pain points right now – and they are in some cases quite interconnected…

Tail recursion

All Seph code has to be tail recursive, which means a tail call should never grow the stack. In order to make this happen on the JVM you need to save information away somewhere about where to continue the call. Then anyone using a value has to check for a tail marker token, and if one that is found, that caller will have to do a repeated call on the current tail until a real value is produced. All the information necessary for the tail will also have to be saved away somewhere.

The approach I’m currently taking is fairly similar to Erjangs. I have a SThread object that all Seph calls will have to pass along – this will act as a thread context as soon as I add light weight threads to Seph. But this place also serves a good place to save away information on where to go next. My current encoding of the tail is simply a MethodHandle that takes no arguments. So the only thing you need to do to pump the tail call is to repeatedly check for the token and call the tail method handle. Still, doing this all over the place might not be that performant. At the moment, the code is not looking up a MethodHandle from scratch in the hot path, but it will have to bind several arguments in order to create the tail method handle. I’m unsure what the performance implications of that will be right now.

Argument evaluation from the callee

One aspect of Seph that works the same as in Ioke is that a method invocation will never evaluate the arguments. The responsibility of evaluating arguments will be in the receiving code, not the calling code. And since we don’t know whether something will do a regular evaluation or do something macro-like, it’s impossible to actually pre-evaluate the arguments and push them on the stack.

The approach Ioke and the Seph interpreter takes is to just send in the Message object and allow the callee to evaluate it. But that’s exactly what I want to avoid with Seph – everything should be possible to compile, and be running hot if that’s possible. So sending Messages around defeats the purpose.

I’ve found an approach to compile this that actually works quite well. It also reduces code bloat in most circumstances. Basically, every piece of code that is part of a message send will be compiled to a separate method. So if you have something like foo(bar baz, qux) that will compile into the main activation method and two argument methods. This approach is recursive, of course. What this gives me is a protocol where I can use method handles to the argument methods, push them on the stack, and then allow the callee to evaluate them however they want. I can provide a standard evaluation path that just calls each of the method handles in turn to generate the values. But it also becomes very easy for me to send them in unevaluated. As an example this is almost exactly what the current implementation of the built in “if” method looks like. (It’s not exactly like this right now, because of transitional interpreter details).

public final static SephObject _if(SThread thread, LexicalScope scope,
        MethodHandle condition, MethodHandle then, MethodHandle _else) {
    SephObject result = (SephObject)condition.invokeExact(thread, scope, 
                                                          true, true);
    
    if(result.isTrue()) {
        if(null != then) {
            return (SephObject)then.invokeExact(thread, scope, 
                                                true, true);
        } else {
            return Runtime.NIL;
        }
    } else {
        if(null != _else) {
            return (SephObject)_else.invokeExact(thread, scope, 
                                                 true, true);
        } else {
            return Runtime.NIL;
        }
    }
}

Of course, this approach is not perfect. It’s still a lot of code bloat, I can’t use the stack to pass things to the argument evaluation, and the code to bind the argument method handles take up most of the generated code at the moment. Still, it seems to work and gives a lot of flexibility. And compiling regular method evaluations will make it possible to bind these argument method handles straight in to an invoke dynamic call site, which could improve the performance substantially when evaluating arguments (something that will probably happen quite often in real world code… =).

Intrinsics are just regular messages

Many of the things that are syntax elements in other languages are just messages in Seph. Things like “nil”, “true”, “false”, “if” and many others work exactly the same way as a regular message send to something you have defined yourself. In many cases this is totally unnecessary though – and in most cases knowing the implementation at the call site allow you to improve things substantially in many cases. I think it’s going to be fairly uncommong to override any of those standard names. But I still want to make it possible to do it. And I’m fine with the programs that do this takng a performance hit from it. So the approach I’ve come up with (but not implemented yet) is this – I will special case the compilation of every place that has the same name as one of the intrinsics. This special casing will bind to a different bootstrap method than regular Seph methods. As a running example, let’s consider compiling a piece of code with “true” in it. This will generate a message send that will be taken care of by a sephTrueBootstrapMethod. We still have to send in all the regular method activation arguments, though. What this bootstrap method will do is to set up a call site that points to a very special method handle. This method handle will be a guardWithTest created through a SwitchPoint specific to the true value. The first path of that GWT (guardWithTest) will just return the true value directly without any checks whatsoever. The else path of the GWT will fallback to a regular Seph fallback method that does inline caching and regular lookup. The magic happens with the SwitchPoint – the places that create new bindings will check for these intrinsic names and if one of those names is used anywhere in the client code, the SwitchPoint will be changed over to the slow path.

In summary, I think a fast path can be possible for many of these things for most programs. The behaviour when you override “if” should still work as expected, but will make the global performance of that program slower for the rest of the execution.

When does lexical scopes escape?

Seph has mutable lexical scopes. But it’s impossible to know which names will escape and which won’t – so as far as I can see, I can’t use the Java stack to represent variables except for in some small amount of very degenerate cases. I’m not sure if it’s worth it to have that code path yet, so I haven’t thought much about it.

Class based PICs aren’t a good fit

One of the standard optimizations that object oriented languages use is something called a polymorphic inline cache. The basic idea is that looking up a method is the really slow operation. So if you can save away the result of doing that, guarded by a very cheap test, then you can streamline the most common cases. Now, that cheap test is usually a check against the class. As long as you send in an instance with the same class, then a new method lookup doesn’t have to happen. Doing a getClass and then a identity equality on that is usually fairly fast (a pointer comparison on most architectures) – so you can builds PICs that don’t actually spend much time in the guard.

But Seph is a prototype based language. So any object in the system can have different methods or values associated with a name, and there is no clear delineation on objects with new names and values in them. Especially, since Seph objects are immutable, every new object will most likely have a new set of values in them. And saving a way objects and dispatching on them becomes much less performant, since the call sites will basically never work on the same object. Now, there are solutions to this – but most of them are tailored for languages where you usually use a class based pattern. V8 uses an approach called hidden classes to figure out things like that. I’m considering implementing something similar, but I’m a bit worried that the usage pattern of Seph will be far enough away from the class based world that it might not work well.

Summary

So, Seph is not terribly easy to compile, and I don’t have a good feeling for how fast it can actually be made. I guess we’ll have to wait and see. But it’s also an interesting challenge, coming up with solutions to these problems. I think I might also have to go on a new research binge, investigating how Self and NewtonScript did things.



Safe(r) monkey patching


Ruby make it possible to pretty much change anything, anywhere. This is obviously very powerful, but it’s also something that can cause a lot of pain if it’s not done in a disciplined manner. The way this is handled on most Ruby projects is by heaving clear strategies for what to change, how to name it and where to put the source file. The most basic advice is to always use modules for extensions and changes if it is at all possible. There are several good reasons for this, but the main one is that it makes it easier for someone debugging your application to find out where the code is defined.

The one absolute rule that should never be violated in a Rails or Ruby project is to modify the original source code. In the worst case, fork the project and make the changes there, but never, never, never change code in vendor/plugins or vendor/gems.

Let’s start with a simple example. Say I want to recreate the presence method I mentioned in a previous blog post. A first version make look like this:

class Object
  def presence
    return self if present?
  end
end

But if I open up IRb and get hold of this method, it’s not immediately obvious where it’s defined:

o = Object.new
p o.method(:presence)  #=> #<Method: Object#presence>

However, if I were to implement it using a module instead, like this:

module Presence
  def presence
    return self if present?
  end
end

Object.send :include, Presence

If I look at the method now, the output is a bit changed:

p o.method(:presence)  #=> #<Method: Object(Presence)#presence>

We can now see that the method actually comes from the Presence module instead of the Object class. In most Ruby projects, these kind of extensions will be namespaced, using the word extensions or ext as part of the module name. When I add the presence method to code bases, I usually put it in lib/core_ext/object/presence.rb, in a module called CoreExt::Object::Presence. All of this to make it as easy to possible to find these extensions and changes.

There are many other benefits to putting an extension like this in a module. It makes your code cleaner, more flexible, and it composes better if you happen to have conflicting definitions. You can also use modules more selectively if you want, including just adding it to selected objects if necessary.

Props to my colleague Brian Guthrie for alerting me to this useful side effect of defining extensions with modules.

There is a slight wrinkle in this scenario, specifically for adding extensions to modules. Sadly, the way the Ruby module system works, you can’t include a new module into Enumerable and have that take effect in places where Enumerable has already been mixed in. Instead you have to define the methods directly on Enumerable. The general problem looks like this:

module X
  def hello
    42
  end
end

class Foo
  include X
end

Foo.new.hello #=> 42

module Y
  def goodbye
    25
  end
end

module X
  include Y
end

Foo.new.goodbye #=> undefined method `goodbye' for #<Foo:0x129f94> (NoMethodError)

This is a bit sad, since it means extensions have to be written in two different ways, depending on where you aim to use them. The general rules still applies — you should put the extensions in well named files that are easy to find. And if you can extract the functionality to a module and then delegate to that, that is preferrable.



Comparing times and dates in Ruby


In one of the Rails projects I’m involved with, we do most of the local development against SQLite and then deploy against Oracle. This is a bit annoying for many reasons, but by far the largest cause of trouble is the handling of dates. I haven’t exactly figured out the rules, but for some reason sometimes Oracle returns DateTime in situations where SQLite returns a Date. This usually causes quite subtle problems that have effects in other parts of the application. This brings me to the small piece of advice I wanted to talk about in this column. Always make sure that you know if you are working with a Date, a Time or a DateTime, since these all have slightly different behavior, especially when it comes to comparisons.

The rule is quite simple. If you think you can have a Time object, make sure to turn it into a DateTime object before trying to compare it to a Date object. What happens otherwise? Unfunny things:

Date.today < Time.now #ArgumentError: comparison of Date with Time failed

Time.now > Date.today #true

Time.now == Date.today #false

Date.today == Time.now #nil

Date.today <=> Time.now #nil

Date.today != Time.now #true

The first time I saw some of these results, I was a bit confused. Especially the last three. But they do make a twisted kind of sense. Namely, it’s OK for the <=> operator to return nil if it can’t do a comparison between two objects. And the != in Ruby is hardcoded to return the inverse of the value returned from ==, and the since nil is a falsey value, the inverse of that becomes true.

What I wanted to mention with these things is that you should always make sure you don’t have the Date on the left hand side of a comparison. Or if you want to do a comparison, explicitly call to_date to coerce them. Finally, if you want to do date and time comparisons, I find the best behavior usually comes from coercing both sides with to_datetime before doing the comparison.



Panel on Internet Freedom


Next week, ThoughtWorks and The Churchill Club is organizing a live panel about Internet freedom and the implications of the recent WikiLeaks events. This is going to be a world class event with very engaging speakers, and it will also be streamed live if you can’t attend in person. Daniel Ellsberg, Peter Thiel, Clay Shirky, Jonathan Zittrain and Roy Singham will discuss various important subjects touching on the WikiLeaks controversy:

WikiLeaks: Why it Matters. Why it Doesn’t?
Wednesday, January 19, 2011
5:30-8:30 PM Pacific Standard Time

The purpose of this discussion is to take an objective look at the WikiLeaks controversy and its potential threats to the future of the free Internet. Notwithstanding the varied personal opinions of WikiLeaks and Julian Assange, this issue reaches beyond the actions of one person or one website. Precedents that will determine the very future of the Internet are being set as the world grapples with new social and information models. This is a serious issue worthy of serious discussion and debate.

Paul Jay, CEO and senior editor of The Real News Network will moderate the panel, which includes:

  • Daniel Ellsberg, Former State and Defense Dept. Officialy prosecuted for releasing the Pentagon Papers
  • Clay Shirky, Independent Internet Professional; Adjunct Professor, Interactive Telecommunications Program, New York University
  • Neville Roy Singham, Founder and Chairman, ThoughtWorks
  • Peter Thiel, President, Clarium Capital; Managing Partner, Founder’s Fund
  • Jonathan Zittrain, Professor of Law and Professor of Computer Science, Harvard University; Co-founder, Berkman Center for Internet & Society

Please join us along with other executives from diverse industries and positions, all of whom will gather to listen and engage with these panelists who represent a rich cross section of the communities impacted by the WikiLeaks issue.

Attend in person!

Santa Clara Marriott: 2700 Mission College Boulevard · Santa Clara, California 95054 USA

A buffet dinner will be served.

To register to attend this event please visit the Churchill Club’s Website at : http://www.churchillclub.org/eventDetail.jsp?EVT_ID=892. There is a small charge for this event, however guests of ThoughtWorks may use a special discount, gtworks25, to receive the Churchill Club’s member rate.

View live-streamed event

The Real News Network: http://bit.ly/fdjoMr

Fora.tv: http://bit.ly/f4cuz6



Named Scopes


One of my favorite features of Rails is named scopes. Maybe it’s because I’ve seen so much Rails code with conditions and finders spread all over the code base, but I feel that named scopes should be used almost always where you would find yourself otherwise writing a find_by or even using the conditions argument directly in consumer code. The basic rule I follow is to always use named scopes outside of models to select a specific subset of a model.

So what is a named scope? It’s really just an easier way of creating a custom finder method on your model, but it gives you some extra benefits I’ll talk about later. So if we have code like:

class Foo < ActiveRecord::Base
  def self.find_all_fluxes
    find(:all, :conditions => {:fluxes => true})    
  end
end

p Foo.find_all_fluxes

we can easily replace that with

class Foo < ActiveRecord::Base
  named_scope :find_all_fluxes, :conditions => {:fluxes => :true}
end

p Foo.find_all_fluxes

You can give named_scope any argument that find can take. So you can create named scopes specifically for ordering, specifically for including an association, etc.

The above example is fixed to always have the same conditions. But named_scope can also take arguments. You do that by instead of fixing the arguments, send in a lambda that will return the arguments to use:

def self.ordered_inbetween(from, to)
  find(:all, :conditions => {:order_date => from..to})
end

Foo.ordered_inbetween(10.days.ago, Date.today)

can become:

named_scope :ordered_inbetween, lambda {|from, to|
  {:conditions => {:order_date => from..to}}
}  

Foo.ordered_inbetween(10.days.ago, Date.today)

It’s important that you use the curly braces form of block for the lambda — if you happen to use do-end instead, you might not get the right result, since the block will bind to named_scope. The block that binds to named_scope is used to add extensions to the collections returned, and thus will not contribute to the arguments given to find. It’s a mistake I’ve done several times, and it’s very easy to make. So make sure to test your named scopes too!

So what is it that makes named scopes so useful? Several things. First, they are composable. Second, they establish an explicit interface to the model functionality. Third, you can use them on associations. There are other benefits too, but these are the main ones. Simply put, they are a cleaner solution than the alternatives.

What do I mean by composable? Well, you can call one on the result of calling another. So say that we have these named scopes:

class Person < ActiveRecord::Base
  named_scope :by_name, :order => "name ASC"
  named_scope :by_age, :order => "age DESC"
  named_scope :top_ten, :limit => 10
  named_scope :from, lambda {|country|
    {:conditions => {:country => country}}
  }
end

Then you can say:

Person.top_ten.by_name
Person.top_ten.from("Germany")
Person.top_ten.by_age.from("Germany")

I dare you to do that in a clean way by using class method finders.

I hope you have already seen what I mean by offering a clean interface. You can hide some of the implementation details inside your model, and your controller and view code will read more cleanly because of it.

The third point I made was about associations. Simply, if you have an association, you can use a named scope on that association, just as if it was the model class itself. So if we have:

class ApartmentBuilding < ActiveRecord::Base
  has_many :tenants, :class_name => "Person"
end

a = ApartmentBuilding.first
a.tenants.from("Sweden").by_age

So named scopes are great, and you should use them. Whenever you sit down to write a finder, see if you can’t express it as a named scope instead.

(Note: some of the things I write about here only concerns Rails 2. Most of the work I do is still in the old world of 2.3.)



Use presence


One of the things you quite often see in Rails code bases is code like this:

do_something if !foo.blank?

or

unless foo.blank?
  do_something
end

Sometimes it’s useful to check for blankness, but in my experience it’s much more useful to check for presence. It reads better and doesn’t deal in negations. When using blank? it’s way too easy to use complicated negations.

For some reasons it seems to have escaped people that Rails already defines the opposite of blank? as present?:

do_something if foo.present?

There is also a very common pattern that you see when working with parameters. The first iteration of it looks like this:

name = params[:name] || "Unknown"

This is actually almost always wrong, since it will accept a blank string as a name. In most cases what you really want is something like this:

name = !params[:name].blank? ? params[:name] : "Unknown"

Using our newly learnt trick, it instead becomes:

name = params[:name].present? ? params[:name] : "Unknown"

Rails 3 introduces a new method to deal with this, and you should back port it to any Rails 2 application. It’s called presence, and the definition looks like this:

def presence
  self if present?
end

With this in place, we can finally say

name = params[:name].presence || "Unknown"

These kind of style things make a huge difference in the small. Once you have idiomatic patterns like these in place in your code base, it’s easier to refactor the larger parts. So any time you reach for blank?, make sure you don’t really mean present? or even presence.



Blog about the future of the Internet


I have decided I have several things related to Wikileaks, the future of the Internet and other related things that I want to write about. However, that content seems like it will be a bit far from what I usually write about in this blog (however seldom that actually is). So I’ve decided to create a new, parallel blog for these postings. I will not make any notice of new blog posts on that blog here and the posts to that blog will not be syndicated through the ThoughtWorks blog syndication. If you are interested in what I have to say on these issues, the new blog is can be found here. That’s all.