Separate blog about Privacy, Anonymity and Cryptography

It’s been a long while. But I have been writing a little bit during 2014 as well. I decided to switch venue a bit, once my writing became almost exclusively about privacy, anonymity and cryptography. Since my day-to-day job has been research in these areas for the last 2 years, it has become natural to write about what I think on these subjects.

So, if you are interested in following these kind of thoughts, please follow along at


It’s almost a year since my last blog post. I’ve had a busy and interesting year, and wanted to take the time to talk about a few different things. One of the reasons I haven’t blogged as much as I wanted is that I’ve been working on several things that aren’t exactly public at the moment. Moving to Ecuador has also involved a whole bunch of work and lead to less time.

Computer and cloud

Some of my last blog posts had to do with me wanting to change what hardware and cloud providers I used. I went through a few different laptops until I settled on a Thinkpad X1 Carbon (2nd gen). It’s got the right weight to power ratio, larger screen, nice keyboard and works quite well with Linux. I ended up moving to Fedora for my main operating system which has served me pretty well.

When it comes to password managers, I’ve ended up with KeePass2 for Linux. It works satisfactorily even though it was a bit of a pain to get running, what with Mono dependencies etc.

I have also been hosting my own email infrastructure for the last year or so. It’s worked out quite well.

For hosting my server infrastructure I ended up with a Swedish provider called – I am now happily far away from Rackspace and the other American hosting providers.

Programming languages

The last few years haven’t really seen much change in the programming language scene. There are new interesting experiments coming out now and again, while Clojure and Scala are gaining more ground. The biggest difference I’ve seen lately is that Go is moving to becoming a really useful tool, especially for scalable systems where the low-level nature of Go works well, but the security properties of the language is also a plus. Many of the security oriented systems I’ve seen the last year is moving towards Go. And of course, being able to create a binary that runs anywhere is extremely powerful.

I have a new language rattling around in the back of my brain. I kinda wish I had 6 months to sit down and get it out. I think it could be quite nice. However, at the moment there doesn’t seem to be that much time for language experiments.


Most of my work the last 18 months have been focused on research around anonymity, privacy, security and cryptography. It’s an extremely interesting place to be, and ThoughtWorks is actively working on several different angles in this space. The security field is seeing a lot of sea change after the Snowden revelations. Lots of snake-oil coming out, but also lots of interesting new approaches.

One of the things that worry me about all this is the continued focus on using browsers to implement security aware applications. In my mind this is a dangerous trend – browsers are the largest attack surface out there, and you are also forced to use JavaScript. And JavaScript is not a well suited language for strong security.

E-mail is pretty high up on the list for a lot of people and organizations to fix in one way or another. This is also a space where we are working to create something. We will open source what we have quite soon – it’s something pragmatic that we think can make real change. I’m excited about it.

Meanwhile, here in Ecuador I’m doing research and thinking about the problems with transport security, specifically DNS and TLS. Hopefully I’ll have something more substantial to write about that soon.


Some might wonder where I’ve disappeared, since I haven’t been to many of the conferences I used to go to. I’ve missed the JVM Language Summit, QCon in many cities, StrangeLoop and others. I’m very sad about all that, but I haven’t been able. However, if you happen to be at Goto Aarhus, do say hi.

Switching My Life

This is a description of intent, with some rationale. Maybe some of this will be useful for you. If nothing else, some advice would also be appreciated. So. What is is this about? I have decided to make some changes in my electronic life. I will not make all the changes immediately, and I don’t have any complete plans yet – this is the outlines for a long term plan.

My current situation

I have used a MacBook as my main computer for the last 7 years. I do a lot of development in different environments, and when I switched, MacOS X coupled with the hardware allowed me to get more things done without dealing with stuff I didn’t want to deal with. But MacOS X also afforded me the possibility of tweaking and changing many of the parts of the OS when I needed to do that.

I have several email accounts, most of them are GMail in one variety or another. I also use several other pieces of the Google ecosystem.

ITunes is my main music player. I have lots of music and other things that I regularly sync with my IPad and IPhone – both of whom I depend a lot on in my day-to-day life.

This blog and a few other services are hosted on Rackspace.

In addition to an IPhone, I also have a Galaxy Note 2. I use both phones and the IPad extensively.

I currently store most of my life inside of 1Password.

I use Dropbox for sharing files and information between different people and devices.

Why I want to change

Fundamentally I am a believer in free software. I believe that open ecosystems are better than closed ones, and I believe that monocultures are extremely bad in the long run. I am not a huge fan of centralization, and I don’t like the anglocentric focus of our industry. I am not a huge fan of having all my electronic life hosted under the auspices of US legislation, especially not in light of recent events. I am also getting more and more uncomfortable with closed services and software that I can’t inspect.

But looking at the various things that define my electronic life, it’s clear that my day-to-day actions speak a very different message from my beliefs. So I am going to change that. Of course I realize that this might be painful. There are many things that a monocolture does quite well. It’s a local optima for certain problems. But as part of this effort I will have to take a hit in productivity to stand for what I believe in.

What I will change to

I have not completely decided all the particulars of the direction I’m going to take. Since it will be a long term effort, I can take it step by step. The first and probably biggest step is that I will migrate from an Apple laptop as my main programming device. I will instead run a System76 Gazelle with Debian 7.

Of course, switching back to Linux will mean that several things will be easier to switch to – I won’t be able to keep using some of my usual tools.

Open questions

There are a whole slew of open questions in this quest. The biggest one is probably what to do about mobile phones. None of the smartphones out there are particularly open while being strong enough for daily use. Maybe the Ubuntu Edge will be that phone at some point, but for now I’m not sure.

A password manager is also a requirement. I really like 1Password, but since it is closed source I am uncomfortable keeping my credentials there much longer. The only viable alternative seems to be KeePassX. I haven’t tried it yet, but since it hasn’t seen updates for several years, that doesn’t strike me as very confidence inspiring.

I want to get out of GMail, but I have no idea where I will go. I might host for myself, but that comes with a significant burden.

I currently run my servers on Rackspace. I need to change that to something that is hosted in a better legal framework, but there are not that many good cloud providers out there.

Any recommendations and thoughts are welcome!

Technical Details from Snowden

This summer has given confirmation to many things that technologists only guessed before. We know much more about what the NSA, GCHQ and other intelligence services are doing around the world, how they are subverting privacy and security in the name of fighting terrorism. All of this is primarily thanks to Edward Snowden, Laura Poitras and Glenn Greenwald – with the help of many other courageous people. For the technically inclined, last weeks revelations about how the NSA is pursuing a broad program to subvert all kinds of encryption was probably one of the most worrying releases. But right now we’re also seeing a strong backlash against Greenwald, claiming that he should be releasing the names of technologies broken, the companies involved and who specifically is complicit in all this. A lot of people are ascribing malicious intentions to Greenwald for keeping these things to himself. I would just like to add two things to the debate:

First, it is highly likely that Snowden did not in fact have access to what specific technologies were broken. It might not exist in the papers he gave to Greenwald and others. As far as we know, Snowden was not cleared for BULLRUN and related programs, and the fact that we know about them is because he managed to get access to protected documents he wasn’t supposed to be able to access. So I think it’s only fair to give Greenwald the benefit of the doubt – he might not be able to tell us the specific algorithms that are broken. Let’s not immediately jump to the conclusion that he is acting maliciously.

When it comes to what companies and people are complicit in these issues, in the short term it would be very useful for us to know. I suspect there are good reasons why this information hasn’t been released yet – but let’s not forget that many companies have been outed as cooperating in one way or another under the PRISM program.

The big problem is this – for us technologists to stop future BULLRUN programs to happen we need to build new organizational structures. We need to guard ourselves from compromised algorithms and hardware chips with backdoors. In order to do that we need to change how we do these things – and this will require long term cultural fixes. And even though it would be very satisfying in the short term to know what companies and people to be angry at, in the long run we need to build up an immune system that stops this from happening again.

This all said – I’m dying to know all these details myself. I think it’s pretty human. But let us not lose sight of the real battle.

How do you safely communicate over email?

Communicating safely over email is actually pretty complicated. I wanted to walk through the steps necessary in order to create a complete email identity online that should be reasonably safe against interception, network analysis and impersonation.

Before we begin, you need to have the Tor Browser Bundle installed. You also need to make sure that you never do anything related to your email account without having the connection going over Tor.

One important aspect of is the ability to find a good email provider where you don’t have to supply real personal information. If you ever have to supply your real information you will also always have to trust that the email provider does the right thing. The one thing you can never get away from is that network analysis can happen on your email if the provider can’t be trusted. If this happens, your only recourse is to be sure that the people you are talking to are using the same techniques, and that you are using several of these accounts for various activities.

The first step is to find a provider that matches your needs. For this I’m going to use I could also use hushmail or other services, although none of these are completely safe. I will first generate the email address/username I want. In order to do this, you need a mechanism of generating randomness. I will use 1Password for this, and generate a completely random string. However, an alternative you can use is to go to one of the random name generators available online (go there using Tor), and then generate a random name there. Once you have a random name and a long, really random password, you can go ahead and start registering for an account.

When signing up, use Tor for all the connections and make sure to not give any extra information asked for (such as time zone or country, for example). Once you have been completely signed up, use Tor again and sign in to the web client to make sure everything works as it should.

The next step is to create a private key specifically for this email account. I will use the command line to do this, using gpg. Before you create this key, you should also create a new pass phrase for yourself. Use the XKCD Battery Staple method, with about 5-6 words. However, be very careful to choose these words randomly. If you don’t use a random (really random) process, you lose all the benefits of having a pass phrase and it becomes a very weak password. So, once you have the pass phrase, you can use it to create a new private key:

gpg –gen-key

The choices I will make are these: RSA and RSA for the kind of key. A keysize of 4096, and a validity of 2 years. I will specify the username for the email address as the name for the key. Finally you will be asked for the pass phrase, so enter it now. Make sure to never push this key to a keyserver using the gpg program.

Once you have created the key, you should use the Tor browser to add it to the keyservers. First export the public key into a file. Make sure to not export the private part of the key. Once you have Tor up and running you can go to and submit it there.

In order to use this account you should probably use Thunderbird and TorBirdy. If you have non-anonymous accounts you need to figure out how to run multiple Thunderbird instances, since TorBirdy takes over the full installation. You need a new profile and should install Enigmail and TorBirdy once you have the Thunderbird installed. Then you can go ahead and configure the mail account. It is important to install TorBirdy before you configure the mail account. Once you’ve configured the mail account, it’s a good idea to make sure Enigmail will encrypt and sign emails by default.

You are now ready to send safe and anonymous email. There are a few different things to keep in mind for this. First, make sure to never access this account over a regular connection. Second, never download keys automatically from the keyserver, instead always manually download and import it. Finally, never send an email in the clear. Always encrypt it using the right key. If you ever send an email like this in clear text over the same connection, you have lost most of the potential security of the account.

In order for this to work you should give your account information and fingerprint of the public key in person to the people who should have it.

Finally, all these things can not guarantee safety.

Comments and corrections to this writeup are very welcome.

Complexity and Brain Size

The last year or so I’ve been leading a small team of developers. We’ve been working on a project that involves genomics and molecular biology, bioinformatics, oncology and computional biology. Saying that it’s hugely complex is an understatement. One of the interesting dynamics of this project was that I personally designed and implemented a large portion of the project. A big part of the project was also to solve the actual problem – our client did not have a solution when we came to them, so really we ended up getting access to resources and experts, but no predefined solution.

The question that I’ve been pondering the last week or so is this – if the project had been even more complex; so complex that I wouldn’t have been able to fit it all in my head – could we still have solved it? If I had 50% of the information in my head and someone else had the other 50%, would it still be possible to come up with a working solution?

My intuition is that the answer to this is no. I don’t think a problem of this complexity level could have been solved in a good way by sharing the responsibility for coming up with the solution.

On the plus side, I have never encountered anything even close to this magnitude of complexity before. The projects I’ve been on have been a variety of different enterprise style projects and most of them doesn’t really have much in terms of domain complexity. So maybe this is not a problem in practice for most cases.

But on the other hand, we still have lots of unsolved problems in highly complex and scientific domains. In order to solve them, we need people that can understand both the science and the software aspects at the same time. And based on my experience the last year, I suspect that there are real limits in what kinds of problems we can actually take on in this way. There has to be a better solution. I don’t think we have a solution to this problem yet. Incremental development methodologies really doesn’t help for this.

Another interesting aspect of this project is that we did not have any BAs (business analysts). Most of our projects have BAs and it’s highly unusual to not have them. In retrospect it was the right choice for us and I can now verbalize why – when you have BAs working with the domain, you still have to take into account the communication with the developers and tech leads. If the domain is complex enough and the developers need to have that understanding, having BAs would actually get in the way and the communication surface area would be to large to effectively work. Me and one of my colleagues ended up together doing all the BA work in conjunction with our design and implementation work.

Working on a project like this has been a very different experience. It’s definitely clear to me that our standard ways of working with businesses doesn’t really apply.

A new server infrastructure

A month ago, Joyent told me that the machine I was hosting my server on was being end-of-lifed. This server has been running my blog, the Ioke and Seph home pages and various other small things for several years now. However, I’m ashamed to admit that the machine was always a Snowflake server. So now when I had to do something about it, I decided to make this the right way. So I thought I would write up a little bit what I ended up with in order to make this into a Phoenix server.

The first step was to decide on where to host my new server. At work I’ve been using AWS since January, and I like the experience. However, for various reasons I was not comfortable using Amazon for my personal things, so I looked at the capabilities of other providers. I ended up choosing Rackspace. In general I have liked it, and I would use them again for other things.

Once I had a provider, I had to find the right libraries to script provisioning of the server. We have been using Fabric and Boto for this at work – but we ended up having to write quite a lot of custom code for this. So I wanted something a bit better. I first looked at Pallet – from the feature lists it looked exactly like what I wanted, but in practice, the classpath problems with getting JClouds working with Rackspace correctly was just too much, and I gave up after a while. After that I looked around for other libraries. There are several OpenStack libraries, both for Ruby and for Python. Since I’m more comfortable I ended up with the Ruby version (the gem is called openstack-compute). However, since it was tied heavily to openstack I wrote a thin layer of genering provisioning functionality on top of it. Hopefully I’ll be able to clean it up and open source it soon.

When I had everything in place to provision servers, it was time to figure out how to apply server configurations. I’ve always preferred Puppet over Chef, and once again it’s something we use at my current project, so Puppet it is. I use a model where I pack up all the manifests and push them to the server and then run puppet there. That way I won’t have to deal with a centralized puppet server or anything else getting in the way.

And that is really all there is to it. I have all my sites in Git on Github, I have a cron script to pulls regularly from them, making sure to inject passwords in the right places after pulling. All in all, it ended up being much less painful than I expected.

So what about Rackspace then? The good: the APIs work fine, the machines seems stable and performant, and they now have a CloudDB solution that gives you roughly the same kind of capability as RDS. However, there is currently no backup option for the cloud databases. The two things I missed the most from AWS was EBS volumes and elastic IPs. However, EBS volumes are also a pain to deal with sometimes – I wish there was a better solution. Elastic IPs seems like the right solution for fast deploys, stable DNS names and so on – but they also have some problems, and the more I’ve been thinking about it, the less I realize I want them. In order to get zero-down time deploys and having stable DNS names I think the right solution is to use the Rackspace load balancer feature. You attach the DNS name to the LB, and then have scripts that point to the real machines. The useful feature for what I’ve been doing is that you can take up a new staging/pre-production server, add it to the load balancer (but while doing so, set the policy to only balance your personal home IP or the IP of your smoke test runner to the new box). That means you can try out the machine while everyone else gets routed to the old machine. Once you are satisfied you can just send an atomic update to the load balancer and switch out the policy to only point to the new machine.

Going forward, I think this is the kind of model I actually want for our AWS deploys as well.

So all in all, I’ve been pretty happy with the move.

Passwords Are Terrible

I’ve been going through a rash of password resets and changes the last few days, and as such things always do, it set me thinking. If I’m lucky, most of this won’t really be much of a surprise for you. It certainly won’t contribute anything significant to the security world.

My thought is simply that passwords are terrible. I know, I know – not an original thought. Basically, passwords might have been the right trade off between convenience and security a while back. I’m unsure when that stopped being the case, but I’m pretty sure it’s been more than a few years ago. However, we are still using passwords, and we there are a lot of things we do that doesn’t necessarily make our use better. Just as black hats use social engineering to crack systems, security experts should use social engineering and experience design to entice people to do the safest possible thing under the circumstances. Sadly, we are however doing the exact opposite right now.

Let’s take a look at a laundry list of what you should be doing and what you are doing:

  • You should never use the same password in more than one place. REALITY: people basically always reuse passwords or password variants on different services. The proliferation of places that require passwords and login means we either have the choice of having more than 50 passwords, or reuse. But if you reuse passwords, within each set of services with the same password, the service with the most sensitive material, will be protected by the least secure service. So as long as you use the same password, you have to think about all the services you’ve used that password on to have a realistic idea about how protected that password is.
  • You should never use words, numbers from your life or names from your life in your password – scrambled or not. REALITY: basically everyone does one of these things – most wifi-network passwords I know are combinations of the names of the children in the family. In order to remember passwords, we usually base them on existing words and then scramble them with a few letters switched out for digits or added a few symbols. This practice is basically completely useless, unless your password is actually a pass phrase. And if your password is in reality a pass phrase you don’t gain much by scrambling the words. So for a really secure password, use a pass phrase, but that the words in the phrase are randomly selected.
  • Security policies usually require you to change passwords every 2 or 3 months. REALITY: This means you are training people to choose insecure passwords. If you have to change passwords often you have a few choices – you can write it down or you can use variations on the same password. Note that remembering a new strong password every 2 months is not an option – people will simply not do it. Most people I know uses a sequence of numbers added to a base password, and they change these numbers every time they are forced to change the password. All of these things come together to defeat the purpose of the security policy. It is security theatre, simple and pure. If your company has a policy that requires you to change passwords like this, that is basically guaranteed to be a company with no real security.

What is the solution? I have decided to base my life around 1Password. For most purposes, the combination of the password generator, the browser integration, and the syncing between different devices means that it’s mostly hassle-free to have really strong passwords in all places I want it. I think 1Password is really good at what it does, but it’s still a stop-gap measure. The basic idea of using passwords for authentication is an idea that should be relegated to history. We need a better alternative. We need something that is more secure and less brittle than passwords. But we also need something that is more convenient than two-factor authentication. For most of us that login to services all the time, two-factor is just too slow – unless we get to a point where we have a few central authentication providers with roaming authentication.

Is there a better alternative? HTTP has included support for TLS Client Certificates for a long time now, and in theory it provides all the things we would want. In practice, it turns out to be inconvenient for people to use, expiration and other aspects of certificates complicates and frustrates things.

I guess what I would want is a few different things. The first would be to simply make it possible to have my browser automatically sign a challenge and send it back, instead of putting in a password in the login box. That would require a little support from the browser, but could potentially be as easy to use as passwords.

Another thing that could make this possible is if 1Password had support for private keys as well as passwords. This would mean syncing between devices would become substantially easier. Someone would have to think up a simple protocol for making it possible to use this system instead of passwords on a web page. This is a bit of a catch-22, since you need support from the browser or extension for it to be worth putting in to your service. I kinda wish Google would have done something like this as the default for Google Accounts, instead of going all the way to two-factor.

In summary, I think we are ready for something better than passwords. I would love if we could come together and figure out something with better usability and security than passwords so we can finally get rid of this scourge.

6 months with Clojure

I have spent the last 6 months on a project where Clojure was the main technology in use. I can’t really say much about the project itself, except that it’s a fairly complicated thing with lots of analytics and different kinds of data involved. We ended up with an environment that had a lot of Ruby and JavaScript/CoffeeScript as well as Clojure. We are using Neo4J for most of our data storage.
In this blog post I wanted to basically talk about a few different things that has worked well or not so well with Clojure.

Being on 1.4

When the project started, Clojure 1.4 was in alpha. We still decided to run with it, so we were running Clojure 1.4alpha for about one month, and two different betas for another month or so. I have to say I was pleasently surprised – we only had one issue during this time (which had to do with toArray of records, when interacting with JRuby) – and that bug had already been fixed in trunk. The alphas and betas were exceptionally stable and upgrading to the final release of 1.4 didn’t really make any difference from a stack standpoint.

Compojure and Ring

We ended up using Compojure to build a fairly thin front end, with mostly JSON endpoints and serving up a few HTML pages that was the starting points for the JavaScript side of the app. In general, both Compojure and Ring works quite well – the ring server and the uberjar both worked with no major problems. I also like how clean and simple it is to create middleware for Ring. However, it was sometimes hard to find current documentation for Compojure – it seems it used to support many more things than it does right now, and most things people mention about it just aren’t true anymore.


In order to get some dynamic things into our pages, we used Enlive. I really liked the model, and it was quite well suited for the restricted dynamicity we were after.

DSL with lots of data

One of my less bright ideas was to create an internal DSL for some of our data. The core part of the DSL was a bunch of macros that knew how to create domain objects of themselves. This ended up being very clean and a nice model to work with. However, since the data was in the amounts of millions of entries the slowness of actually evaluating that code (and compiling it, and dealing with the permgen issues) ended up getting unbearable. We recently moved to a model that is quite similar, except we don’t evalute the code, instead using read-string on the individual entries to parse them.

Dense functions

Clojure makes it really easy to create quite dense functions. I sometimes find myself combining five or six data structure manipulation functions in one go, then taking a step back and look at the whole thing. It usually makes sense the first time, but coming back to it later, or trying to explain what it does to a pair is usually quite complicated. Clojure has extraordinarily powerful functions for manipulation of data structures, and that makes it very easy to just chain them together into one big mess.
So in order to be nice to my team mates (and myself) I force myself to break up those functions into smaller pieces.


One aspect of breaking up functions like described above, is that the operations involved are usually highly abstract and sometimes not very coupled to domain language. I find naming of those kind of functions very hard, and many times spend a long time and still not coming up with something I’m completely comfortable with. I don’t really have a solution to this problem right now.


For some reason, we haven’t used most of the concurrency aspects of Clojure at all. Maybe this is because our problems doesn’t suit themselves to concurrent processing, but I’m not sure this is the root of the reason. Suffice to say, most of our app is currently quite sequential. We will see if that changes going forward.


I’ve been having a blast with Clojure. It’s clearly the exactly right technology for what I’m currently doing, and it’s got a lot of features that makes it very convenient to use. I’m really looking forward being able to use it more going forward.

Notes on syntax

The last few years the expressiveness of programming languages have been on my mind. There are many things that comes into consideration for expressiveness, not matter what definition you actually end up using. However, what I’ve been thinking about lately is syntax. There’s a lot of talk about syntax and many opinions. What made me start thinking more about it lately was a few blog posts I read that kind of annoyed me a bit. So I thought it was time to put out some of my thoughts on syntax here.

I guess the first question to answer is whether syntax matters for a programming language. The traditional computer science view is largely that syntax doesn’t matter. And in a reductionist, system level view of the world this is understandable. However, you also have the opposite view which comes strongly into effect especially when talking about learning a new language, but also for reading existing code. At that point many people are of the opinion that syntax is extremely important.

The way I approach the question is based on programming language design. What can I do when designing a language to make it more expressive for as many users as possible. To me, syntax plays a big part in this. I am not saying that a language should designed with a focus on syntax or even with syntax first. But the language syntax is the user interface for a programmer, and as such there are many aspects of the syntax that should help a programmer. Help them with what? Well, understanding for one. Reading. Communicating. I suspect that writing is not something we’re very much interested in optimizing for in syntax, but that’s OK. Typing fewer characters doesn’t actually optimize for writing either – the intuition behind that statement is quite easy: imagine you had to write a book. However, instead of writing it in English, you just wrote the gzipped version of the book directly. You would definitely have to type much less – but would that in any way help you write the book? No, probably it would make it harder. So typing I definitely don’t want to optimize. However, I would like to make it easy for a programmer to express an idea as consicely as they can. To me, this is about mentioning all things that are relevant, without mentioning irrelevant things. But incidentally, a syntax with that property is probably going to be easier to communicate with, and also to read, so I don’t think focusing on writing at all is the right thing to do.

Fundamentally, programming is about building abstractions. We are putting together extremely intricate mind castles and then try to express them in such a way that our computers will realize them. Concepts, abstractions – and manipulating and communicating them – are the pieces underlying programming languages, and it’s really what all languages must do in some way. A syntax that makes it easier to think about hard abstractions is a syntax that will make it easier to write good and robust programs. If we talk about the Sapir-Whorf hypothesis and linguistic relativity, I suspect that programmers have an easier time reasoning about a problem if their language choice makes those abstractions clearer. And syntax is one way of making that process easier. Simply put, the things we manipulate with programming languages are hard to think about, and good syntax can improve that.

Seeing as we are talking about reading – who is this person reading? It makes a huge difference if we’re trying to design something that should be easy to read for a novice or we’re trying to design a syntax that makes it easier for an expert to understand what’s going on. Optimally we would like to have both, I guess, but that doesn’t seem very realistic. The things that make syntax useful to an expert are different than what makes it easy to read for a novice.

At this point I need to make a request – Rich Hickey gave a talk at Strange Loop a few months ago. It’s called Simple made Easy and you can watch it here: – you should watch it now.

Simply put, if you had never learnt any German, should you really expect to be able to read it? Is it such a huge problem that someone who has never studied Prolog will have no idea what’s going on until they study it a bit? Doesn’t it make sense that people who understand German can express all the things they need to say in that language? Even worse, when it comes to programming languages, people expect them to be readable to people who have never programmed before! Why in world would that ever be a useful goal? It would be like saying German is not readable (and is thus a bad language) because dolphins can’t read it.

A tangential aspect to the simple versus easy of programming languages is also how our current syntactic choices echo what’s been done earlier. It’s quite uncommon with a syntax design that becomes wildly successful while looking completely different from previous languages. This seems to have more to do with how easy a language is to learn, rather than how good the syntax actually is by itself. As such, it’s suspect. Historical accidents seem to contribute much more syntax design than I am comfortable with.

Summarizing: when we talk about reading programming languages, it doesn’t make much sense to optimize for someone who doesn’t know the language. In fact, we need to take as a given that a person knows a programming language. Then we can start talking about what aspects reduce complexity and improve communication for a programmer.

When are talking about reading of languages, one thing that sometimes come up is the need for redundancy. Specifically, one of the blogs that inspired these thoughts basically claimed that the redundancy in the design of Java was a good thing, because it improved readability. Now, I find this quite interesting – I have never seen any research that explains why this would be the case. In fact, the only argument in support I’ve heard that backs up the idea is that natural languages have highly redundant elements, and thus programming languages should too. First, that’s not actually true for all natural languages – but we must also consider _why_ natural languages have so much redundancy built in. Natural languages are not designed (with a few exceptions) – they grow to have the features they have because they are useful. But reading, writing, speaking and listening of natural languages have so different evolutionary pressures from each other that they should be treated differently. The reason we need redundancy is simply because it’s very hard to speak and listen without it. For all intents and purposes, what is considered good and idiomatic in spoken language is very different from written language. I just don’t buy this argument for redundancy. It might be good with redundancy in programming language syntax, but so far I remain to be convinced.

It is sometimes educational to look at mathematical notation. However, mathematical notation is just that – notation. I’m not convinced we can have one single notation for programming languages, and I don’t think it’s something to aspire to. But the useful lesson from math notation is how terse it is. However, you still need to spend a long time to digest what it means. That’s because the ideas are deep. The thinking that went into them is deep. If we ever come to a point where programming languages can embody as deep ideas in as terse a notation, I suspect we will have figured out how to design programming language syntax that is way better than what we have right now.

I think this covers most of the things I wanted to cover. At some point I would like to talk about why I think Smalltalk, Ruby, Lisp and some others have quite good syntax, and how that syntax is intimately related with why those languages are powerful and expressive. Some other random thoughts I wanted to cover was evolvability of language syntax, whether a syntax should be designed to be easy to parse, and possibly also how much English specifically has impact the design of programming languages. But these are thoughts for another time. Suffice to say, syntax matters.