Bottom Types in Dynamic Languages


On the Friday of QCon, Sir Tony Hoare held a presentation about the null reference. This was interesting stuff, and it tied into something I’ve been thinking about for a while. Namely, the similarities and differences in the role of null/nil in static and dynamic languages.

At the surface, a null reference in a statically typed language seems to be a value of a bottom type. Since null is still a value, that cannot be true – since according to the formal definition, a bottom type can have no real value.

To take a typical example of where a bottom type is needed, take a look at what happens in Scala when you have a method that always throws an exception. Of course, that method can never return – so what is the type of the return value? That type is Nothing, the bottom type in Scala. There are no values of type Nothing in Scala, and can never be. The Nothing type is really useful in other circumstances too. Lists in Scala correspond to functional linked lists, where each cons cell is immutable. Every cons cell has a type – this type is the most generic type of the value the cons cell points to, and the type of cons cell it points to. Recursively, this means that the full list will have a generic type that is the least generic type that can encompass every element of the list. But how does this really work for the end element? Every list needs to have a final element that stops the list. This end element generally doesn’t contain a value. So what is the type of that end element? List[Nothing]. The reason being that Nothing can be viewed as the subtype of every other type in the system, which means it is always more specific than any other type. But when you create a new cons cell that points to this end value, the full list will always get the type of the new cons cell.

That explains bottom types, but we are no closer to understanding null references. In a statically typed language, a null reference acts like a value of a bottom type. The reason is that you can assign null to any other type – and this is generally only true for types that are subtypes of the place where it is being assigned to. What is interesting about null in C-like languages, is that there is no type that actually corresponds to null. In languages with null references like this, it seems that the domain of every other type in the system is extended to include null as a value. I haven’t found any reference to how this works from a type checking perspective, but it does look like the bottom type – except that there is no type associated to it. So from now on I will just call null a bottom value. One of the interesting aspects of including a bottom value in your language is that it opens up for runtime errors. If you don’t check every place that uses a reference for the bottom value before using it, you have the potential to get a runtime error.

The reason I started thinking about this was actually to contrast the way null works in a statically typed language, and how it works in a dynamically typed language. But the points above make it more and more obvious that null is actually a runtime feature even in static languages. The type system bypass nulls to allow more dynamism at runtime. But take a look at a dynamically typed language. Of course you won’t have any bottom types, since you don’t have a type system that need to check types. Another feature of most dynamically typed languages is that everything is an expression. In these languages, nil is usually used as a sentinel value to mark that the operation didn’t actually result in any real value. The semantics of how nil is interpreted can depend on both the language and the specific API of the operation in question.

The crucial difference in how nil is handled in a dynamically typed language, compared with how a null reference in a statically typed language works, is one of high level approach. In particular, in a dynamically typed language you will not know what kind of value you get back from an operation. In Ruby you might get back something that is of an expected class, but you can also get something that just looks like that class. Or something totally different, like a recorder object for example. All of these things make nils in dynamically typed languages much less of a problem, since the way you develop in these languages tend to be quite different. In contrast, a null reference in Java is a loop hole, that basically means that anything can return either null or something of the type specified. It’s not exactly a lie, but it is an implicit effect that generally comes as a surprise in a statically typed language.

There is a proposal to add a @NotNull annotation to Java, so you can mark references that shouldn’t allow null. This is not good enough, and will probably not help much at this stage. The reason is that you will be penalized for avoiding null – not the other way around. It also makes code that tries to be correct _more_ verbose, instead of the other way around. Because of backwards compatibility, there is really nothing to do about this though. But I recommend language designers to think carefully before adding a Java-style reference.

Some people asks what the alternative is. What stronger type systems generally allow you to do is have an Option (or Maybe) type, that is genericized on the expected output value.

Say you define a method in Java called get() that will either return a String or a null. If you instead had an Option class in Java, you could define the method as “Option<String> get()”. What is neat about Option is that it’s abstract and it has got two concrete subtypes. The first one is Some, the second is None. This makes it explicit when you can expect a return value to be not be there, and at the same time forces the user of the method to actively handle this case. The default case is references that will always be valid. This a good way to handle this problem in a type system.


12 Comments, Comment or Ping

  1. Interesting bit of trivia: null in Scala actually has type Null. Nothing is the bottom type and it does lack any values, but Null has many of the same properties without sacrificing its one value.

    March 15th, 2009

  2. Khalil

    Neal Gafter in the BGGA closure proposal did give null a type. In version 0.1 of the spec http://www.javac.info/closures-v01.html the “null” type was introduced and “null.class” was the class of null. Then in version 0.2 http://www.javac.info/closures-v02.html java.lang.Unrachable was introduced to cater for the cases where a closure would return abnormally by throwing an exception, it was then an error to assign null to Unreachible. However in version 0.4 http://www.javac.info/closures-v04.html null, the type, was made a subtype of Unreachible and finally Unreachible was renamed Nothing a la scala in version 0.5 of the spec. In http://gafter.blogspot.com/2006/11/closures-esoterica-completion.html Neal alluded to email exchnages with Remi Forax as to why null the type was made a subtype of Unreachable, it would be interesting to have it published.

    March 16th, 2009

  3. Jorge Ortiz

    “At the surface, a null reference in a statically typed language seems to be a value of a bottom type.”

    As Daniel pointed out, that’s exactly how Scala treats null. It’s the only instance of the Null type, which is a “bottom” type for AnyRef (i.e., any java.lang.Object), but not AnyVal (i.e., Java primitives). This is distinct from the true bottom type, Nothing, which is how expressions that only throw exceptions are typed.

    Another observation is that null subverts the type system just as much as throwing an exception does. If your code is not prepared to deal with an exception, your program will blow up. If your code is not prepared to deal with a null, your program will blow up. The problem with errors caused by null seems to come more from culture than from a subverted type system. People know that exceptions (especially unchecked RuntimeExceptions) should only be thrown in exceptional situations, but seem perfectly content to return null in quotidian situations like when a key isn’t found in a HashMap (e.g., http://java.sun.com/j2se/1.5.0/docs/api/java/util/HashMap.html#get(java.lang.Object) ). Subverting the type system in commonplace cases like these does result in surprises and evil NullPointerExceptions, but nothing worse than what would result from throwing unchecked exceptions for everyday error conditions (e.g., throw new HashKeyNotFoundUncheckedException).

    The cultural solution is to discredit the use of “null” as a value for anything other than exceptional situations. One good way to do this, as you suggest, is to use the Option type instead. Anecdotal evidence suggests that NPEs are much rarer in Scala, which has null as the sole value of a bottom-ish type but discourages it’s use for expected program states.

    March 16th, 2009

  4. Jorge Ortiz

    Addendum: It’s worth noting that, in dynamic languages, every value or object is an instance of the Bottom type. Dynamic languages usually only have a single type (i.e., DynamicObject), so that single type is the Top type, the Bottom type, and every type in between. In essence, every value subverts the type system; the type system can’t tell you anything useful about what types of values you might or might not be getting. As you point out, this forces programmers of dynamic languages to program defensively or risk runtime errors.

    March 16th, 2009

  5. Jorge: Thanks for the information about how Scala handles this stuff. Interesting variations.

    I don’t agree about your view on bottom types in dynamic languages – for the simple reason that there is no static type system in a dynamically typed language. If there were, not everything would be a bottom type except for in very pathological cases. A type inferencer would be able to assign bounds to most places.

    I don’t necessarily think it means coding defensively, it is just a different way of approaching it.

    March 16th, 2009

  6. Interestingly, C++ seems to encompass all of this. Firstly, C++ references cannot be NULL, so if you have a reference then you have a real object, whereas pointers can be NULL, and thus model your Option stuff.

    Secondly, the new C++0x standard has nullptr, which is the only possible value of type nullptr_t, and which can be assigned to any pointer to give it a NULL value. In current C++, the literal constant 0 has this property, but this is a compile-time feature rather than run-time — you cannot normally assign an integer to a pointer.

    March 16th, 2009

  7. Tracy Harms

    The language I tend to think in, Iverson’s J, is dynamically typed. Typing is strong, but ignored in the case of empty values. I think this means that empty arrays are effectively Bottom.

    You wrote:

    “So what is the type of that end element? List[Nothing]. The reason being that Nothing can be viewed as the subtype of every other type in the system, which means it is always more specific than any other type.”

    I agree. I’d like to add that in Iverson languages there is also a supertype, less specific than every other type: the array. This means that empty arrays actually have features (such as shape and type) in contrast with Bottom. “Framing” typing in this way, i.e. guaranteeing for all cases both the broadest and narrowest possibilities, smooths many things that would otherwise require special coding for exceptions.

    March 16th, 2009

  8. Jorge Ortiz

    Ola:

    The point I’m trying to get across is that dynamic languages do have a static type system: the type system with only one type. I’m not talking about classes (dynamic languages have lots of those), but about types: looking at this expression in source code form (before run-time) what can I tell about the possible (run-time) values this expression could take. In dynamic languages like Ruby and Python, the answer is always: DynamicObject. That is, the expression could be anything, because everything is an object. That’s fine, because dynamic languages provide affordances to work with DynamicObject. Method invocations (of the form obj.method(arg)) are resolved at run-time (transformed into something of the form obj.invoke(“method”, args), where “invoke” is a handy method defined on DynamicObject, which looks up “method” to see if it’s defined and invokes it with args).

    The above is an observation I’m borrowing from Wadler (for a fleeting reference to this, see: http://jjinux.blogspot.com/2008_01_01_archive.html). Saying “everything is a bottom type” is my cheeky way of saying that dynamic languages are statically typed, except there’s only one type that can be assigned. If Ruby’s or Python’s classes were promoted to types, then, yes, you’re right, not everything would be a bottom type, because there would be more than one type. But then they wouldn’t be dynamic languages either.

    March 16th, 2009

  9. Jorge: yeah, I get your meaning, although I don’t find it a useful way of reasoning. The type system is intrinsically linked to the type checking algorithm, and since there is no type checking in a dynamically typed languages, you can’t really state things about it.

    My point wasn’t about linking runtime classes with types, but rather that if you want to, you could imagine a type inferencer (a type oracle really) that could assign correct types to everything going on in a program, such that it has the same effect as the runtime behavior. That type system would then assign more specific types (actually more general types) than DynamicObject. And you don’t have to resort to an oracle to get this. You can assign types in many different ways to get something that isn’t dynamic object.

    In fact, in an extreme case you could just create a type checker that creates new synthetic existential types for every method invocation.

    March 17th, 2009

  10. Tracy Harms

    Ola Bini wrote: “since there is no type checking in a dynamically typed languages, you can’t really state things about it.”

    I’ve seen domain errors in J as resulting from type checking. How would you distinguish domain-error identification from type checking?

    (For a simple domain error example, apply addition to a number and a literal: 3+’4′ )

    March 17th, 2009

  11. In Java (and in C but the story is a lot more complicated), every reference to an object of class A is of the “option type” parametrized by A. `Maybe A` in Haskell, `’A option` in ocaml. To make the type system this way is obviously wrong and has drastic consequences.

    November 23rd, 2012

Reply to “Bottom Types in Dynamic Languages”