Should languages be multi-lingual?


I’m currently sitting in the Beijing ThoughtWorks office, and for some reason language is on my mind… =)

One of the discussions related to DDD that have turned up several times the last few months at conferences
is how you handle ubiquitous language when your domain is not in English. Since most programming languages are based on English, you end up mixing English and Swedish for example, if you are working with a Swedish domain. Of course, the benefits of working with these concepts in Swedish are very hard to argue against. But the dichotomy between the programming language and the domain language is definitely something that hurts my eyes, so I’m generally not very fond of that approach.

In fact, I haven’t heard anyone come up with a good solution to this problem, and this post is not really a solution either.

One of the things I’ve proposed to make this situation better is to create an external DSL that is fully in the domain language. The implementation of that DSL can then be implemented in English. The main benefit is that there is a clear separation.between the domain language and the programming language. On the other hand, the overhead of creating the DSL and also the complexities involved in translating the domain concepts into programming language concepts can become problematic too.

One interesting idea in Cucumber is the idea that you can easily add new natural languages to write the features in. When it comes to user stories at the level of testing that Cucumber provides, it’s really important to use the right language. So it got me thinking, could you use the same kind of approach in a general programming language too?

As an experiment I took a small example program for Ioke, and translated it into Mandarin, with simplified Chinese characters. Of course I used Google Translate for this, so the translation is probably not very good, but the end result is still interesting. I’m not going to try to get this into my blog, so take a look at the file at github instead: http://github.com/olabini/ioke/blob/master/examples/chinese/account.ik. As you can see there is nothing in there that even reeks of English. If you don’t understand Chinese characters it is probably hard to see what’s happening here. Basically an Account object is created, with a “transfer” method and a “print” method. Further down, two instances of this Account object is created, some transfers are made, and then the objects are printed. But provided my translation is not too crappy, this code should make sense to someone reading Chinese.

Now, this is actually extremely simple to implement in Ioke, since it relies on several of the features Ioke handles very easily. That everything is a message really helps, and having everything be first class means I can alias methods and things like that without any worry. Obviously your language also need to handle non-ascii identifiers correctly, but that should be standard in this day and age.

When thinking about it, something similar to do this can be created in languages like Lisp, Smalltalk, Factor, Io and Haskell – but most other languages would struggle. If you have keywords in your language, it’s really a killer – you would need to branch your parser to make it happen.

Of course, this approach only works when you can simply translate from one word to another. If the writing system is right to left, or top to bottom, it’s much more tricky to create a good translation.

I’m also not sure if this is actually a really good idea or not. It might be. The other thing I’ve been thinking about is how to handle multilingual editing. What if you want to be able to switch back and forth between languages? How can you handle identifiers with more than one name. Would you want to?

Lots of unanswered questions here. But it’s still funny to think about. Communication is the main goal, as usual.


14 Comments, Comment or Ping

  1. I like this idea a lot.

    I’m guessing that most people will NOT like it, because most non-English-speaking programmers are used to programming in English, and you know how people love to stick with what they’re used to! But even if that’s true, and most people don’t like it, “most” is not “all”. It’s about time there were good tools for people who want to program in Chinese, or whatever.

    October 21st, 2009

  2. Ola, have you looked at the history of ApleScript?

    I’ve read that it used to support multiple languages by storing the script in a language neutral format. You could write a scritp in Japanese and then open it in english to make some changes.

    One of the supported languages was ‘programmer’!

    I beleive the dropped support for all but english some time back, but it’s interesting to know that it’s possible.

    October 21st, 2009

  3. Of course you have this functionality in whatever-they-call-the-language-for-Excel-now. SUM(…) is called SUMME(…) in German and so on.

    This is a constant source of weirdness for me, as my Mac OS thinks I’m German, but Google Apps commonly thinks I’m English, and so I’m always using the wrong function name (and I do spreadsheets rarely enough not to get used to it).

    I think this only works with “automatic translation” in Excel because there are of course no user-introduced names as all “variables” are spread sheet cells a la A1.

    October 21st, 2009

  4. I think it’s important to be able to write DSLs in the user’s natural language (specially for non-programmers).

    But regarding the main programming, I wouldn’t like it to be available in multiple languages. My mother tongue is Portuguese and I really hate when people program in Portuguese. It just doesn’t make sense. And even if we were to translate the identifiers, it would still not make sense, since the way we write sentences with then work differently.

    Another important point is if you write a small program in Swedish for your local company, but then there is a German company interested in the program, would you have to translate all the program? Or would you just compile it to machine code and decompile to German?

    I believe having a universal language for programming is very important nowadays, and if you were to restrict yourself to use/read/write code in your language, we would still be programming only in C and COBOL. I’ve noticed that non-english speaker programmers then tend to be lousy ones since they don’t use anything, but the reference book for that programming language in their own natural language. And we all know how bad a programmer is if he doesn’t learn anything else.

    A possible exception to this would be Chinese. At some point in the near future China will have so many people that they won’t need the rest of the world, and may create a Chinese-only language that may become mainstream there. But it would be the same problem with with Chinese instead of English.

    October 21st, 2009

  5. The idea of a multi-lingual support of a computer language is novel but not practical. After all we’re already programmatically translating the ability to translate computer language to human readable language. There are enough abstractions taking place to build higher level languages that by adding yet another abstraction layer for users to build in native languages to be translated down into a core interpreted language that is then translated down and down until finally there is a compiler that is able to take this meta-language to be compiled down into machine code. The only universal language in computers is zero and one after all.
    Considering that with .Net alone you have your C# or VB code base that is translated down into IL, IL is converted down by the CLR. From there it goes down to the JIT compiler, and finally down to machine code. Adding ‘languages’ to C# and VB opens up a whole new can of words on exactly how human language syntax would boil down into .Net syntax that would then need to be compiled down into something the CLR can understand. By having human readable regional languages other than English you’re opening up new bugs that are only fixable by those able to translate your particular languages syntax. What we’re talking about is a new bug type. Call it the hyper language syntax bug that could be caused by an improperly translated syntactical markup.
    Might as well expand it to have a new HTML markup, XML definitions needs to be multiple language supported, and exactly where do you stop? At what point in time do you become so compromising to individual languages that you begin spending most of development time supporting the languages than you do the programs? There are a startling number of sheer programming languages to support in so many environments that by creating anything other than a standard language for language syntax would cause so much ungodly amount of unnecessary complexity. English became the standard for programming languages and I endorse that. If you want to program in something non-English try Perl. Perl is pretty much a universal language that is alien to almost everyone equally.

    October 21st, 2009

  6. This could actually make programming harder for developers with less-common languages. They’d have to mentally switch between local code and documentation in their native language, and code/docs from elsewhere in English or other languages.

    Using English as a lingua franca for software isn’t very fair, but it does allow programmers who learn it to communicate with a global community.

    October 21st, 2009

  7. Kumar

    Diarmuid Pigott has been building Protium with such multilingual programming in mind for, probably, a decade now. Yet to ripen, but http://www.protiumblue.com/ has some info.

    October 21st, 2009

  8. When you think about it, non-native-english-speaker programmers know words/concepts/models/objects from multiple languages. Sometimes these words/concepts don’t overlap easily. If you have to stick to one language (as a policy), everybody have actually to translate. Sometimes painfully and always inefficently : because you’d been translating a word/concept that everybody in the team would understand untranslated.

    I think the issue here are the scripts supported by the language and the platform. There is no need to make a LSL (that’s Language Specific Language), you *write* whatever *words* everybody in the team understands. Like you’ve always been doing.

    October 22nd, 2009

  9. Jurgen

    I think we are still in the early days where everything related to programming is done in English. I don’t believe this is necessarily going to be the case in the future. I don’t see the benefit of teaching a Chinese child how to program by only using English words. So I would argue that this is indeed a worthwhile issue when you create a new language.

    I would perhaps take issue with the reference to DDD and ubiquitous language (UL) which I don’t think is the best of reasons to worry about this. In my experience, the UL really is very useful in the discussions within the team and the domain experts, use cases/stories etc. etc. Maybe it’s just me, but in general the UL already doesn’t shine through that well in most implementations, ie it already gets watered down quite a lot… unfortunately most code already doesn’t read like it’s written in a DSL (English or not), even after doing a lot of thinking about the UL. It may just be that we’re just not very good of course ;-)

    October 22nd, 2009

  10. Sean W

    While making language-specific DSLs is an idea worth thinking, but as Matt pointed out, there is just no point in making languages for people who don’t know what to say.
    I imagine programs like this (translated into english)

    >>
    hi program!
    gimme some FAT spreadsheet for the file my boss sent me
    y’know with pie diagrams and stuff.
    and make it in 3d.
    oh yea, and fast
    ENTER!
    <<

    Problem is not even the language, but the lack of any well-defined concepts. I certainly do want to make a reference to “convention over code” here. Most programmers, even mediocrely experienced ones, would be able to communicate, using the word “substr” and be pretty sure that their partner understands the intent behind substr’ing something. And that communication need not be email; looking at the other’s source code is also a form of communication.
    If someone would send to me source code in German, I’d be able to read it, but I’d have only a smaller community consenting over what a certain e.g. function name should mean, reducing the chance for mutual agreement over the intent of a function call (while still thinking that they understood it correctly according to their own interpretation) and an increased chance of using them the wrong way.

    I remember that at least SQL and Lingo (Macromedia Director script) were intended to liken natural English. Afai can tell, they failed at that the very moment they introduced their own linguistical concepts.

    October 23rd, 2009

  11. Are you sure this would be possible in Haskell? I’m pretty sure you’d have a hard time aliasing keywords like data, type, class, instance, where, etc.

    November 23rd, 2009

  12. Emptist 行者悟空

    Hi
    I have been fans of NLP (Native human Language Programming) for years :) and have tried to program in Chinese in TCL/Visualworks Smalltalk and now just started to check out Io. Mainly I work in Smalltalk.
    I would like NLP language to support something like Linux/Unix BASH alias facility. Then codes in different languages can be automatically translated.
    In term of Model-Control-View, we can look at human native language support as a layer of Code-View or PUI (Programmer User Interface).

    Human language can have influence on programming language. Classic Chinese is very close to a clean programming language syntax.

    陰陽者 天地之道也
    者 is like setSlot() and 也 is like clone, what after 之 is a slot of what before 之.

    YinYang := Universe Tao clone.

    December 8th, 2009

  13. Roland Tepp

    Hi Ola,

    I just heard your episode on Ioke at SE-Radio and found this blog as I was browsing for more information on Ioke…

    While I am sort of ambivalent about natural language support in programming languages, I quite like the approach you’ve taken.

    As many commenters already have pointed out, there are notable precedents of AppleScript and Excel macros that are multi-lingual, but from what I’ve heard, the multi-lingual feature of these is a thin veneer of presentation over the core language itself, just short of simple string replacement.

    I like your approach a fair bit more, because it is a more stable experience for the native language user and it defines an explicit dependency on the native language DSL, the code is written in.

    In this way, while it is sure difficult to read the code for anyone outside the cultural domain of that particular natural language, on the plus side is that the natural language of variables at leas matches operators and keywords. I can’t say how much it pains me to see franch, german or swedish identifiers intermixed with english code and keywords…

    February 23rd, 2010

Reply to “Should languages be multi-lingual?”