Separate blog about Privacy, Anonymity and Cryptography


It’s been a long while. But I have been writing a little bit during 2014 as well. I decided to switch venue a bit, once my writing became almost exclusively about privacy, anonymity and cryptography. Since my day-to-day job has been research in these areas for the last 2 years, it has become natural to write about what I think on these subjects.

So, if you are interested in following these kind of thoughts, please follow along at https://reap.ec.



Update


It’s almost a year since my last blog post. I’ve had a busy and interesting year, and wanted to take the time to talk about a few different things. One of the reasons I haven’t blogged as much as I wanted is that I’ve been working on several things that aren’t exactly public at the moment. Moving to Ecuador has also involved a whole bunch of work and lead to less time.

Computer and cloud

Some of my last blog posts had to do with me wanting to change what hardware and cloud providers I used. I went through a few different laptops until I settled on a Thinkpad X1 Carbon (2nd gen). It’s got the right weight to power ratio, larger screen, nice keyboard and works quite well with Linux. I ended up moving to Fedora for my main operating system which has served me pretty well.

When it comes to password managers, I’ve ended up with KeePass2 for Linux. It works satisfactorily even though it was a bit of a pain to get running, what with Mono dependencies etc.

I have also been hosting my own email infrastructure for the last year or so. It’s worked out quite well.

For hosting my server infrastructure I ended up with a Swedish provider called moln.is – I am now happily far away from Rackspace and the other American hosting providers.

Programming languages

The last few years haven’t really seen much change in the programming language scene. There are new interesting experiments coming out now and again, while Clojure and Scala are gaining more ground. The biggest difference I’ve seen lately is that Go is moving to becoming a really useful tool, especially for scalable systems where the low-level nature of Go works well, but the security properties of the language is also a plus. Many of the security oriented systems I’ve seen the last year is moving towards Go. And of course, being able to create a binary that runs anywhere is extremely powerful.

I have a new language rattling around in the back of my brain. I kinda wish I had 6 months to sit down and get it out. I think it could be quite nice. However, at the moment there doesn’t seem to be that much time for language experiments.

Security

Most of my work the last 18 months have been focused on research around anonymity, privacy, security and cryptography. It’s an extremely interesting place to be, and ThoughtWorks is actively working on several different angles in this space. The security field is seeing a lot of sea change after the Snowden revelations. Lots of snake-oil coming out, but also lots of interesting new approaches.

One of the things that worry me about all this is the continued focus on using browsers to implement security aware applications. In my mind this is a dangerous trend – browsers are the largest attack surface out there, and you are also forced to use JavaScript. And JavaScript is not a well suited language for strong security.

E-mail is pretty high up on the list for a lot of people and organizations to fix in one way or another. This is also a space where we are working to create something. We will open source what we have quite soon – it’s something pragmatic that we think can make real change. I’m excited about it.

Meanwhile, here in Ecuador I’m doing research and thinking about the problems with transport security, specifically DNS and TLS. Hopefully I’ll have something more substantial to write about that soon.

Conferences?

Some might wonder where I’ve disappeared, since I haven’t been to many of the conferences I used to go to. I’ve missed the JVM Language Summit, QCon in many cities, StrangeLoop and others. I’m very sad about all that, but I haven’t been able. However, if you happen to be at Goto Aarhus, do say hi.



How do you safely communicate over email?


Communicating safely over email is actually pretty complicated. I wanted to walk through the steps necessary in order to create a complete email identity online that should be reasonably safe against interception, network analysis and impersonation.

Before we begin, you need to have the Tor Browser Bundle installed. You also need to make sure that you never do anything related to your email account without having the connection going over Tor.

One important aspect of is the ability to find a good email provider where you don’t have to supply real personal information. If you ever have to supply your real information you will also always have to trust that the email provider does the right thing. The one thing you can never get away from is that network analysis can happen on your email if the provider can’t be trusted. If this happens, your only recourse is to be sure that the people you are talking to are using the same techniques, and that you are using several of these accounts for various activities.

The first step is to find a provider that matches your needs. For this I’m going to use RiseUp.net. I could also use hushmail or other services, although none of these are completely safe. I will first generate the email address/username I want. In order to do this, you need a mechanism of generating randomness. I will use 1Password for this, and generate a completely random string. However, an alternative you can use is to go to one of the random name generators available online (go there using Tor), and then generate a random name there. Once you have a random name and a long, really random password, you can go ahead and start registering for an account.

When signing up, use Tor for all the connections and make sure to not give any extra information asked for (such as time zone or country, for example). Once you have been completely signed up, use Tor again and sign in to the web client to make sure everything works as it should.

The next step is to create a private key specifically for this email account. I will use the command line to do this, using gpg. Before you create this key, you should also create a new pass phrase for yourself. Use the XKCD Battery Staple method, with about 5-6 words. However, be very careful to choose these words randomly. If you don’t use a random (really random) process, you lose all the benefits of having a pass phrase and it becomes a very weak password. So, once you have the pass phrase, you can use it to create a new private key:

gpg –gen-key

The choices I will make are these: RSA and RSA for the kind of key. A keysize of 4096, and a validity of 2 years. I will specify the username for the email address as the name for the key. Finally you will be asked for the pass phrase, so enter it now. Make sure to never push this key to a keyserver using the gpg program.

Once you have created the key, you should use the Tor browser to add it to the keyservers. First export the public key into a file. Make sure to not export the private part of the key. Once you have Tor up and running you can go to http://sks-keyservers.net/i and submit it there.

In order to use this account you should probably use Thunderbird and TorBirdy. If you have non-anonymous accounts you need to figure out how to run multiple Thunderbird instances, since TorBirdy takes over the full installation. You need a new profile and should install Enigmail and TorBirdy once you have the Thunderbird installed. Then you can go ahead and configure the mail account. It is important to install TorBirdy before you configure the mail account. Once you’ve configured the mail account, it’s a good idea to make sure Enigmail will encrypt and sign emails by default.

You are now ready to send safe and anonymous email. There are a few different things to keep in mind for this. First, make sure to never access this account over a regular connection. Second, never download keys automatically from the keyserver, instead always manually download and import it. Finally, never send an email in the clear. Always encrypt it using the right key. If you ever send an email like this in clear text over the same connection, you have lost most of the potential security of the account.

In order for this to work you should give your account information and fingerprint of the public key in person to the people who should have it.

Finally, all these things can not guarantee safety.

Comments and corrections to this writeup are very welcome.



Complexity and Brain Size


The last year or so I’ve been leading a small team of developers. We’ve been working on a project that involves genomics and molecular biology, bioinformatics, oncology and computional biology. Saying that it’s hugely complex is an understatement. One of the interesting dynamics of this project was that I personally designed and implemented a large portion of the project. A big part of the project was also to solve the actual problem – our client did not have a solution when we came to them, so really we ended up getting access to resources and experts, but no predefined solution.

The question that I’ve been pondering the last week or so is this – if the project had been even more complex; so complex that I wouldn’t have been able to fit it all in my head – could we still have solved it? If I had 50% of the information in my head and someone else had the other 50%, would it still be possible to come up with a working solution?

My intuition is that the answer to this is no. I don’t think a problem of this complexity level could have been solved in a good way by sharing the responsibility for coming up with the solution.

On the plus side, I have never encountered anything even close to this magnitude of complexity before. The projects I’ve been on have been a variety of different enterprise style projects and most of them doesn’t really have much in terms of domain complexity. So maybe this is not a problem in practice for most cases.

But on the other hand, we still have lots of unsolved problems in highly complex and scientific domains. In order to solve them, we need people that can understand both the science and the software aspects at the same time. And based on my experience the last year, I suspect that there are real limits in what kinds of problems we can actually take on in this way. There has to be a better solution. I don’t think we have a solution to this problem yet. Incremental development methodologies really doesn’t help for this.

Another interesting aspect of this project is that we did not have any BAs (business analysts). Most of our projects have BAs and it’s highly unusual to not have them. In retrospect it was the right choice for us and I can now verbalize why – when you have BAs working with the domain, you still have to take into account the communication with the developers and tech leads. If the domain is complex enough and the developers need to have that understanding, having BAs would actually get in the way and the communication surface area would be to large to effectively work. Me and one of my colleagues ended up together doing all the BA work in conjunction with our design and implementation work.

Working on a project like this has been a very different experience. It’s definitely clear to me that our standard ways of working with businesses doesn’t really apply.



A new server infrastructure


A month ago, Joyent told me that the machine I was hosting my server on was being end-of-lifed. This server has been running my blog, the Ioke and Seph home pages and various other small things for several years now. However, I’m ashamed to admit that the machine was always a Snowflake server. So now when I had to do something about it, I decided to make this the right way. So I thought I would write up a little bit what I ended up with in order to make this into a Phoenix server.

The first step was to decide on where to host my new server. At work I’ve been using AWS since January, and I like the experience. However, for various reasons I was not comfortable using Amazon for my personal things, so I looked at the capabilities of other providers. I ended up choosing Rackspace. In general I have liked it, and I would use them again for other things.

Once I had a provider, I had to find the right libraries to script provisioning of the server. We have been using Fabric and Boto for this at work – but we ended up having to write quite a lot of custom code for this. So I wanted something a bit better. I first looked at Pallet – from the feature lists it looked exactly like what I wanted, but in practice, the classpath problems with getting JClouds working with Rackspace correctly was just too much, and I gave up after a while. After that I looked around for other libraries. There are several OpenStack libraries, both for Ruby and for Python. Since I’m more comfortable I ended up with the Ruby version (the gem is called openstack-compute). However, since it was tied heavily to openstack I wrote a thin layer of genering provisioning functionality on top of it. Hopefully I’ll be able to clean it up and open source it soon.

When I had everything in place to provision servers, it was time to figure out how to apply server configurations. I’ve always preferred Puppet over Chef, and once again it’s something we use at my current project, so Puppet it is. I use a model where I pack up all the manifests and push them to the server and then run puppet there. That way I won’t have to deal with a centralized puppet server or anything else getting in the way.

And that is really all there is to it. I have all my sites in Git on Github, I have a cron script to pulls regularly from them, making sure to inject passwords in the right places after pulling. All in all, it ended up being much less painful than I expected.

So what about Rackspace then? The good: the APIs work fine, the machines seems stable and performant, and they now have a CloudDB solution that gives you roughly the same kind of capability as RDS. However, there is currently no backup option for the cloud databases. The two things I missed the most from AWS was EBS volumes and elastic IPs. However, EBS volumes are also a pain to deal with sometimes – I wish there was a better solution. Elastic IPs seems like the right solution for fast deploys, stable DNS names and so on – but they also have some problems, and the more I’ve been thinking about it, the less I realize I want them. In order to get zero-down time deploys and having stable DNS names I think the right solution is to use the Rackspace load balancer feature. You attach the DNS name to the LB, and then have scripts that point to the real machines. The useful feature for what I’ve been doing is that you can take up a new staging/pre-production server, add it to the load balancer (but while doing so, set the policy to only balance your personal home IP or the IP of your smoke test runner to the new box). That means you can try out the machine while everyone else gets routed to the old machine. Once you are satisfied you can just send an atomic update to the load balancer and switch out the policy to only point to the new machine.

Going forward, I think this is the kind of model I actually want for our AWS deploys as well.

So all in all, I’ve been pretty happy with the move.



Passwords Are Terrible


I’ve been going through a rash of password resets and changes the last few days, and as such things always do, it set me thinking. If I’m lucky, most of this won’t really be much of a surprise for you. It certainly won’t contribute anything significant to the security world.

My thought is simply that passwords are terrible. I know, I know – not an original thought. Basically, passwords might have been the right trade off between convenience and security a while back. I’m unsure when that stopped being the case, but I’m pretty sure it’s been more than a few years ago. However, we are still using passwords, and we there are a lot of things we do that doesn’t necessarily make our use better. Just as black hats use social engineering to crack systems, security experts should use social engineering and experience design to entice people to do the safest possible thing under the circumstances. Sadly, we are however doing the exact opposite right now.

Let’s take a look at a laundry list of what you should be doing and what you are doing:

  • You should never use the same password in more than one place. REALITY: people basically always reuse passwords or password variants on different services. The proliferation of places that require passwords and login means we either have the choice of having more than 50 passwords, or reuse. But if you reuse passwords, within each set of services with the same password, the service with the most sensitive material, will be protected by the least secure service. So as long as you use the same password, you have to think about all the services you’ve used that password on to have a realistic idea about how protected that password is.
  • You should never use words, numbers from your life or names from your life in your password – scrambled or not. REALITY: basically everyone does one of these things – most wifi-network passwords I know are combinations of the names of the children in the family. In order to remember passwords, we usually base them on existing words and then scramble them with a few letters switched out for digits or added a few symbols. This practice is basically completely useless, unless your password is actually a pass phrase. And if your password is in reality a pass phrase you don’t gain much by scrambling the words. So for a really secure password, use a pass phrase, but that the words in the phrase are randomly selected.
  • Security policies usually require you to change passwords every 2 or 3 months. REALITY: This means you are training people to choose insecure passwords. If you have to change passwords often you have a few choices – you can write it down or you can use variations on the same password. Note that remembering a new strong password every 2 months is not an option – people will simply not do it. Most people I know uses a sequence of numbers added to a base password, and they change these numbers every time they are forced to change the password. All of these things come together to defeat the purpose of the security policy. It is security theatre, simple and pure. If your company has a policy that requires you to change passwords like this, that is basically guaranteed to be a company with no real security.

What is the solution? I have decided to base my life around 1Password. For most purposes, the combination of the password generator, the browser integration, and the syncing between different devices means that it’s mostly hassle-free to have really strong passwords in all places I want it. I think 1Password is really good at what it does, but it’s still a stop-gap measure. The basic idea of using passwords for authentication is an idea that should be relegated to history. We need a better alternative. We need something that is more secure and less brittle than passwords. But we also need something that is more convenient than two-factor authentication. For most of us that login to services all the time, two-factor is just too slow – unless we get to a point where we have a few central authentication providers with roaming authentication.

Is there a better alternative? HTTP has included support for TLS Client Certificates for a long time now, and in theory it provides all the things we would want. In practice, it turns out to be inconvenient for people to use, expiration and other aspects of certificates complicates and frustrates things.

I guess what I would want is a few different things. The first would be to simply make it possible to have my browser automatically sign a challenge and send it back, instead of putting in a password in the login box. That would require a little support from the browser, but could potentially be as easy to use as passwords.

Another thing that could make this possible is if 1Password had support for private keys as well as passwords. This would mean syncing between devices would become substantially easier. Someone would have to think up a simple protocol for making it possible to use this system instead of passwords on a web page. This is a bit of a catch-22, since you need support from the browser or extension for it to be worth putting in to your service. I kinda wish Google would have done something like this as the default for Google Accounts, instead of going all the way to two-factor.

In summary, I think we are ready for something better than passwords. I would love if we could come together and figure out something with better usability and security than passwords so we can finally get rid of this scourge.



6 months with Clojure


I have spent the last 6 months on a project where Clojure was the main technology in use. I can’t really say much about the project itself, except that it’s a fairly complicated thing with lots of analytics and different kinds of data involved. We ended up with an environment that had a lot of Ruby and JavaScript/CoffeeScript as well as Clojure. We are using Neo4J for most of our data storage.
In this blog post I wanted to basically talk about a few different things that has worked well or not so well with Clojure.

Being on 1.4

When the project started, Clojure 1.4 was in alpha. We still decided to run with it, so we were running Clojure 1.4alpha for about one month, and two different betas for another month or so. I have to say I was pleasently surprised – we only had one issue during this time (which had to do with toArray of records, when interacting with JRuby) – and that bug had already been fixed in trunk. The alphas and betas were exceptionally stable and upgrading to the final release of 1.4 didn’t really make any difference from a stack standpoint.

Compojure and Ring

We ended up using Compojure to build a fairly thin front end, with mostly JSON endpoints and serving up a few HTML pages that was the starting points for the JavaScript side of the app. In general, both Compojure and Ring works quite well – the ring server and the uberjar both worked with no major problems. I also like how clean and simple it is to create middleware for Ring. However, it was sometimes hard to find current documentation for Compojure – it seems it used to support many more things than it does right now, and most things people mention about it just aren’t true anymore.

Enlive

In order to get some dynamic things into our pages, we used Enlive. I really liked the model, and it was quite well suited for the restricted dynamicity we were after.

DSL with lots of data

One of my less bright ideas was to create an internal DSL for some of our data. The core part of the DSL was a bunch of macros that knew how to create domain objects of themselves. This ended up being very clean and a nice model to work with. However, since the data was in the amounts of millions of entries the slowness of actually evaluating that code (and compiling it, and dealing with the permgen issues) ended up getting unbearable. We recently moved to a model that is quite similar, except we don’t evalute the code, instead using read-string on the individual entries to parse them.

Dense functions

Clojure makes it really easy to create quite dense functions. I sometimes find myself combining five or six data structure manipulation functions in one go, then taking a step back and look at the whole thing. It usually makes sense the first time, but coming back to it later, or trying to explain what it does to a pair is usually quite complicated. Clojure has extraordinarily powerful functions for manipulation of data structures, and that makes it very easy to just chain them together into one big mess.
So in order to be nice to my team mates (and myself) I force myself to break up those functions into smaller pieces.

Naming

One aspect of breaking up functions like described above, is that the operations involved are usually highly abstract and sometimes not very coupled to domain language. I find naming of those kind of functions very hard, and many times spend a long time and still not coming up with something I’m completely comfortable with. I don’t really have a solution to this problem right now.

Concurrency

For some reason, we haven’t used most of the concurrency aspects of Clojure at all. Maybe this is because our problems doesn’t suit themselves to concurrent processing, but I’m not sure this is the root of the reason. Suffice to say, most of our app is currently quite sequential. We will see if that changes going forward.

Summary

I’ve been having a blast with Clojure. It’s clearly the exactly right technology for what I’m currently doing, and it’s got a lot of features that makes it very convenient to use. I’m really looking forward being able to use it more going forward.


Notes on syntax


The last few years the expressiveness of programming languages have been on my mind. There are many things that comes into consideration for expressiveness, not matter what definition you actually end up using. However, what I’ve been thinking about lately is syntax. There’s a lot of talk about syntax and many opinions. What made me start thinking more about it lately was a few blog posts I read that kind of annoyed me a bit. So I thought it was time to put out some of my thoughts on syntax here.

I guess the first question to answer is whether syntax matters for a programming language. The traditional computer science view is largely that syntax doesn’t matter. And in a reductionist, system level view of the world this is understandable. However, you also have the opposite view which comes strongly into effect especially when talking about learning a new language, but also for reading existing code. At that point many people are of the opinion that syntax is extremely important.

The way I approach the question is based on programming language design. What can I do when designing a language to make it more expressive for as many users as possible. To me, syntax plays a big part in this. I am not saying that a language should designed with a focus on syntax or even with syntax first. But the language syntax is the user interface for a programmer, and as such there are many aspects of the syntax that should help a programmer. Help them with what? Well, understanding for one. Reading. Communicating. I suspect that writing is not something we’re very much interested in optimizing for in syntax, but that’s OK. Typing fewer characters doesn’t actually optimize for writing either – the intuition behind that statement is quite easy: imagine you had to write a book. However, instead of writing it in English, you just wrote the gzipped version of the book directly. You would definitely have to type much less – but would that in any way help you write the book? No, probably it would make it harder. So typing I definitely don’t want to optimize. However, I would like to make it easy for a programmer to express an idea as consicely as they can. To me, this is about mentioning all things that are relevant, without mentioning irrelevant things. But incidentally, a syntax with that property is probably going to be easier to communicate with, and also to read, so I don’t think focusing on writing at all is the right thing to do.

Fundamentally, programming is about building abstractions. We are putting together extremely intricate mind castles and then try to express them in such a way that our computers will realize them. Concepts, abstractions – and manipulating and communicating them – are the pieces underlying programming languages, and it’s really what all languages must do in some way. A syntax that makes it easier to think about hard abstractions is a syntax that will make it easier to write good and robust programs. If we talk about the Sapir-Whorf hypothesis and linguistic relativity, I suspect that programmers have an easier time reasoning about a problem if their language choice makes those abstractions clearer. And syntax is one way of making that process easier. Simply put, the things we manipulate with programming languages are hard to think about, and good syntax can improve that.

Seeing as we are talking about reading – who is this person reading? It makes a huge difference if we’re trying to design something that should be easy to read for a novice or we’re trying to design a syntax that makes it easier for an expert to understand what’s going on. Optimally we would like to have both, I guess, but that doesn’t seem very realistic. The things that make syntax useful to an expert are different than what makes it easy to read for a novice.

At this point I need to make a request – Rich Hickey gave a talk at Strange Loop a few months ago. It’s called Simple made Easy and you can watch it here: http://www.infoq.com/presentations/Simple-Made-Easy – you should watch it now.

Simply put, if you had never learnt any German, should you really expect to be able to read it? Is it such a huge problem that someone who has never studied Prolog will have no idea what’s going on until they study it a bit? Doesn’t it make sense that people who understand German can express all the things they need to say in that language? Even worse, when it comes to programming languages, people expect them to be readable to people who have never programmed before! Why in world would that ever be a useful goal? It would be like saying German is not readable (and is thus a bad language) because dolphins can’t read it.

A tangential aspect to the simple versus easy of programming languages is also how our current syntactic choices echo what’s been done earlier. It’s quite uncommon with a syntax design that becomes wildly successful while looking completely different from previous languages. This seems to have more to do with how easy a language is to learn, rather than how good the syntax actually is by itself. As such, it’s suspect. Historical accidents seem to contribute much more syntax design than I am comfortable with.

Summarizing: when we talk about reading programming languages, it doesn’t make much sense to optimize for someone who doesn’t know the language. In fact, we need to take as a given that a person knows a programming language. Then we can start talking about what aspects reduce complexity and improve communication for a programmer.

When are talking about reading of languages, one thing that sometimes come up is the need for redundancy. Specifically, one of the blogs that inspired these thoughts basically claimed that the redundancy in the design of Java was a good thing, because it improved readability. Now, I find this quite interesting – I have never seen any research that explains why this would be the case. In fact, the only argument in support I’ve heard that backs up the idea is that natural languages have highly redundant elements, and thus programming languages should too. First, that’s not actually true for all natural languages – but we must also consider _why_ natural languages have so much redundancy built in. Natural languages are not designed (with a few exceptions) – they grow to have the features they have because they are useful. But reading, writing, speaking and listening of natural languages have so different evolutionary pressures from each other that they should be treated differently. The reason we need redundancy is simply because it’s very hard to speak and listen without it. For all intents and purposes, what is considered good and idiomatic in spoken language is very different from written language. I just don’t buy this argument for redundancy. It might be good with redundancy in programming language syntax, but so far I remain to be convinced.

It is sometimes educational to look at mathematical notation. However, mathematical notation is just that – notation. I’m not convinced we can have one single notation for programming languages, and I don’t think it’s something to aspire to. But the useful lesson from math notation is how terse it is. However, you still need to spend a long time to digest what it means. That’s because the ideas are deep. The thinking that went into them is deep. If we ever come to a point where programming languages can embody as deep ideas in as terse a notation, I suspect we will have figured out how to design programming language syntax that is way better than what we have right now.

I think this covers most of the things I wanted to cover. At some point I would like to talk about why I think Smalltalk, Ruby, Lisp and some others have quite good syntax, and how that syntax is intimately related with why those languages are powerful and expressive. Some other random thoughts I wanted to cover was evolvability of language syntax, whether a syntax should be designed to be easy to parse, and possibly also how much English specifically has impact the design of programming languages. But these are thoughts for another time. Suffice to say, syntax matters.



Announcing JesCov – JavaScript code coverage


It seems the JavaScript tool space is not completely saturated yet. As I mentioned in my previous post I’ve had particular trouble finding a good solution to code coverage. So I decided to build my own version of it. The specific feature to notice is transparent translation of source code and support for branch coverage. It also has some limitations at the moment, of course. This is release 0.0.1 and as such is definitely a first release. If you happen to use the Jasmine JUnit runner it should be possible to drop in this directly and have something working immediately.

You can find information, examples and downloads here: http://jescov.olabini.com



JavaScript in the small


My most recent project was on a fairly typical Java Web project where we had a component that should be written in JavaScript. Nothing fancy, and nothing big. It does seem like people are still not taking JavaScript seriously in these kind of environments. So I wanted to take a few minutes and talk about how we developed JavaScript on this project. The kind of advice I’ll be giving here is well suited for web projects with small to medium amounts of JavaScript. If you’re writing large parts of your application on the client side, you probably want to go with a full stack framework to help you out, so these things are less relevant.

Of course, most if not all things I’ll cover here can be gleaned from other sources, and probably better. And if you’re an experienced JavaScript developer, you are probably fine without this article.

I had to do two things to get efficient in using JavaScript. The first one was to learn to ignore the syntax. The syntax is clunky and definitely gets in the way. But with the right habits (such as having a shortcut for function/lambda literals, and making sure to always put the returned value on the same line as the return statement) I’ve been able to see through the syntax and basically use JavaScript in a Scheme-like style. The second thing is to completely ignore the object system. I use a lot of object literals, but not really any constructors or the this-keyword. Both of these features can be used well, but they are also very clunky, and hard to get everyone on a team to understand the same way. I love prototype based OO as a model, and I’ve used it with success in Ioke and Seph. But with JavaScript I generally shy away from it.

The module pattern

The basic idea of the module pattern is that you encapsulate all your code in anonymous functions that are then immediately evaluated to generate the actual top level object. Since JavaScript has some unfortunate problems with global variables (like, they are there), it’s safest to just put all your code inside of one or more of these modules. You can also make your modules take the dependencies you want to use. A simple module might look like this:

var olaBiniSeriousBanking = (function() {
  var balance = 0;

  function deposit(num) {
    balance += num;
  }

  function checkOverdraft(amount) {
    if(balance - amount < 0) {
      throw "Can't withdraw more than exists in account";
    }
  }

  function withdraw(amount) {
    checkOverdraft(amount);
    balance -= amount;
  }

  return {deposit: deposit, withdraw: withdraw};
})();
In this case the balance variable is completely hidden inside a lexical closure, and can only be accessed by the deposit and withdraw functions. These functions are also not in the global namespace so there is no risk for clobbering. It’s also possible to have lots and lots of helper functions that no one else can see. That makes it easier to make your functions smaller – and incidentally, the largest problem I’ve seen with JavaScript code quality is that functions tend to be very large. Don’t do that!
A useful variation of the module pattern is to extract the construction function and give it a name. Even though you might use it immediately, it makes it possible to create more than one of these, use different dependencies, or make it accessible from tests so you can inject collaborators:

var olaBiniGreeterModule = (function(greeting) {
  return {greet: function(name) {
    console.log(greeting + ", " + name);
  }};
});
var olaBiniGreeterEng = olaBiniGreeterModule("Hello");
var olaBiniGreeterSwe = olaBiniGreeterModule("Hejsan");

RequireJS

The module pattern is good on its own, but there are some things that can be done by a loader that makes things even better. There are several variations of these module loaders, but my favorite so far is RequireJS. I have several reasons for this, but the main one is probably that it is very light weight, and is actually a net win even for very small web applications. There are lots of benefits with letting RequireJS handle your modules. The main ones is that it takes care of dependencies between modules, and loads them automatically. This means you can define one single entry point for your JavaScript, and RequireJS makes sure to load everything else. Another good aspect of RequireJS is that it allows you to avoid any global names at all. Everything is handled by callbacks inside of RequireJS. So how does it look? Well, a simple module with a dependency can look like this:

// in file foo.js
require(["bar", "quux"], function(bar, quux) {
  return {doSomething: function() { 
    return bar.something() + quux.something();
  }};
});
If you have something else that uses foo, then this file will be loaded, bar.js and quux.js will be loaded and the results of loading them (the return value from the module function) will be sent in as arguments to the function that creates the foo module. So RequireJS takes care of all this loading. But how do you kick it off? Well, you should have one single script tag in your HTML, that will point to require.js. You will also add an extra attribute to this script tag that points to the entry point to the JavaScript:

<script data-main="scripts/main" src="scripts/require.js"> </script>
This will do a number of things. It will load require.js. It will set the scripts directory as the base for all module references in your JavaScript. And it will load scripts/main.js as if it’s a RequireJS module. And if you want to use our foo-module earlier, you can create a main.js that looks like this:

// in file main.js
require(["foo"], function(foo) {
  require.ready(function() {
    console.log(foo.doSomething());
  });
});
This will make sure that foo.js and its dependencies bar.js and quux.js will be loaded before the function is invoked. However, one aspect of JavaScript that people sometimes gets wrong is that you have to wait until the DOM is ready to execute JavaScript. With RequireJS we use the ready function inside the require object to make sure we can do something when everything is ready. Your main module should always wait with doing something until the document is ready.
In general, RequireJS has helped a lot with structure and dependencies and it makes it very simple to break up JavaScript into much smaller pieces. I like it a lot. There are a few downsides, though. Main is that it doesn’t interact well with server side JavaScript (or at least it didn’t when I read up on it a month ago). Also, it doesn’t provide a clean way of getting access to the module functions without executing them, which becomes annoying when testing these things. I’ll talk a bit more about that in the section on testing.

No JavaScript in HTML

I don’t want any JavaScript whatsoever in the HTML, if I can avoid it. The only script tag should be the one that starts your module and loading framework – in my case RequireJS. We don’t have any event handlers embedded in the pages at all. We started out from a place where some of our pages had lots of event handlers and refactored to a much smaller code base that was much easier to work with by extracting all of these things into separate JavaScript modules. This has a side effect that anything you want to work with should be possible to semantically identify, either by using CSS classes or data attributes. Try to avoid convoluted paths to find elements. It’s OK to add some extra classes and attributes to make your JavaScript clean and simple.

Init functions on ready

In terms of how we structure modules in a real application, we don’t actually do much work on startup. Instead, most of the work involves setting up event handlers and so on. The way we are doing that is to have the top level modules expose an init method, that is expected to be called by the main module when it starts up. Imagine in a system where you have dojo as the main framework, and you have this code:

// foo.js
require(["bar"], function(bar) {
  function sayHello(node) {
    console.log("hello " + node);
  }

  function attachEventHandlers(dom) {
    dom.query(".fluxCapacitors").onclick(sayHello);
  }

  function init(dom) {
    bar.init(dom);
    attachEventHandlers(dom);
  }

  return {init: init};
});

// main.js
require(["foo"], function(foo) {
  require.ready(function() {
    foo.init(dojo);
  });
});
This will make sure to set up all event handlers and put the application in the right state to be used.

Lots of callbacks

Once you’ve taught yourself to ignore the verbosity of anonymous lambdas in JavaScript, they become very handy tools for creating APIs and helper functions. In general, the code we write use a lot of callbacks and helper wrapper functions. I also use functions that generate new functions quite liberally, doing things like currying and similar aspects. A fairly typical example is something like this:

function checkForChangesOn(node) {
  return function() {
    if(dojo.query(node).length() > 42) {
      console.log("Warning, flux reactor in flax");
    }
  };
}

dojo.query(".clixies").onclick(checkForChangesOn(".fluxes"));
dojo.query(".moxies").onclick(checkForChangesOn(".flexes"));
This kind of abstraction can lead to very readable and clean JavaScript if done well. It can also lead to code where very piece is as small as it can be. In fact, one of the ways we use to make the syntax a little bit more bearable is to extract creation of anonymous functions into factory functions like this.

Lots of anonymous objects

Anonymous objects are great for many things. They work as a substitute for named arguments, and can be very useful to return more than one value. In our code base we use anonymous objects a lot, and it definitely helps with code readability.

Testing

We use Jasmine for unit testing our JavaScript. This works quite well in general. Since this is a fairly typical Java web application we wanted to run it as part of our regular build process. This means we ended up using the JUnit Jasmine runner, which allow us to run these tests outside of browsers and format the results using all the available JUnit tools. Since we’ve tried to make the scripts as modular and small as possible, and also extracting most of the DOM behavior, we have avoided using HTML fixtures. This means our tests are leaning more towards traditional unit tests, rather than BDD style tests – which I’m not sure I’m comfortable with. But with the current size of the application, this is not really a problem.
Seeing as we wanted to test each module in isolation, we wanted to be able to instantiate the RequireJS module with our custom mock dependencies. This ended up not being very easy with RequireJS, so instead of trying to fit in to that model, we just don’t load RequireJS at all during testing, but instead have a top-level require function that just saves away the module function with a well defined name. This means we can instantiate the modules as many times as we want and inject different mocks for different purposes.
In general, Jasmine works well for us, but there are some features missing from the mocking/stubbing framework that makes certain things a bit complicated. One thing I miss a lot is the capability of having stubs returning different valueus depending on the arguments sent in. Some ugly code has been written to get around this.

Open questions

Our current JavaScript process works well for us, but there are still some open things we haven’t done yet. First among these is to integrate JSLint into our build process. I really think that should be there, so I have no excuse. We don’t have tests running inside of browsers. I’m actually OK with this, since we’re trying to do more unit level coverage with Jasmine. Hopefully our acceptance tests cover some of the browser based testing. We are not doing minification at all, and we probably won’t need it based on the current expected usage. For a different audience we would certainly minify everything – this is something RequireJS can do really well though. We don’t have any coverage tool running on our JavaScript either. This is something I’m also uncomfortable with, but I haven’t really found a good tool that allows us to run coverage as part of our CI process yet. I also care more about branch coverage than line coverage, and no tool seems to give you this at the moment.

Summary

JavaScript can be completely OK to work with, provided you treat it as a real language. It’s quite powerful, but we also have a lot of bad habits based on hacking together small things, or just doing what works. As we go forward with JavaScript, this needs to stop. But the good news is that if you’re a decent developer, you shouldn’t have any problem picking anything of this up.