The Maintenance myth


Update: I’ve used the words “static” and “dynamic” a bit loose with regards to languages and typing in this post. If this is something that upsets you, feel free to read “static” as “Java-like” and “dynamic” as “Ruby-like” in this post. And yes, I know that this is not entirely correct, but just as mangling the language to remove all gender bias makes it highly inconvenient to write, I find it easier to write in this language when the post is aimed at people in these camps.

Being a language geek, I tend to get into lots of discussions about the differences between languages, what’s good and what’s bad. And being a Ruby guy that hangs out in Java crowds, I end up having the static-vs-dynamic conversation way too often. And it’s interesting, the number one question everyone from the static “camp” has, the one thing that worries them the most is maintenance.

The question is basically – not having types at compile time, won’t it be really hard to maintain your system when it grows to a few millions of lines of code? Don’t you need the static type hierarchy to organize your project? Don’t you need an IDE that can use the static information to give you intellisense? All of these questions, and many more, boil down to the same basic idea: that dynamic languages aren’t as maintainable as static ones.

And what’s even more curious, in these kind of discussions I find people in the dynamic camp generally agrees, that yes, maintenance can be a problem. I’ve found myself doing the same thing, because it’s such a well established fact that maintenance suffers in a dynamic system. Or wait… Is it that well established?

I’ve asked some people about this lately, and most of the answers invariably beings “but obviously it’s harder to maintain a dynamic system”. Things that are “obvious” like that really worries me.

Now, Java systems can be hard to maintain. We know that. There are lots of documentation and talk about hard to maintain systems with millions of lines of code. But I really can’t come up with anything I’ve read about people in dynamic languages talking about what a maintenance nightmare their projects are. I know several people who are responsible for quite large code bases written in Ruby and Python (very large code bases is 50K-100K lines of code in these languages). And they are not talking about how they wish they had static typing. Not at all. Of course, this is totally anecdotal, and maybe these guys are above your average developer. But in that case, shouldn’t we hear these rumblings from all those Java developers who switched to Ruby? I haven’t heard anyone say they wish they had static typing in Ruby. And not all of those who migrated could have been better than average.

So where does that leave us? With a big “I don’t know”. Thinking about this issue some more, I came up with two examples where I’ve heard about someone leaving a dynamic language because of issues like this. And I’m not sure how closely tied they are to maintenance problem, not really, but these were the only ones I came up with. Reddit and CDBaby. Reddit switched from Lisp to Python, and CDBaby switched from Ruby to PHP. Funny, they switched away from a dynamic language – but not to a static language. Instead they switched to another dynamic language, so the problem was probably not something static typing would have solved (at least not in the eyes of the teams responsible for these switches, at least).

I’m not saying I know this is true, because I have no real, hard evidence one way or another, but to me the “obvious” claim that dynamic languages are harder to maintain smells a bit fishy. I’m going to work under the hypothesis that this claim is mostly myth. And if it’s not a myth, it’s still a red herring – it takes the focus away from more important concerns with regard to the difference between static and dynamic typing.

I did a quick round of shouted questions to some of my colleagues at ThoughtWorks I know and respect – and who was online on IM at the mime. The general message was that it depends on the team. The people writing the code, and how they are writing it, is much more important than static or dynamic typing. If you make the assumption that the team is good and the code is treated well from day 0, static or dynamic typing doesn’t make difference for maintainability.

Rebecca Parsons, our CTO said this:

I think right now the tooling is still better in static languages. I think the code is shorter generally speaking in dynamic languages which makes it easier to support.

I think maintenance is improved when the cognitive distance between the language and the app is reduced, which is often easier in dynamic languages.

In the end, I’m just worried that everyone seems to take the maintainability story as fact. Has there been any research done in this area? Smalltalk and Lisp has been around forever, there should be something out there about how good or bad maintenance of these systems have been. There are three reasons I haven’t seen it:

  • It’s out there, but I haven’t looked in the right places.
  • There are maintenance problems in all of these languages, but people using dynamic languages aren’t as keen on whining as Java developers.
  • There are no real maintenance problems with dynamic languages.

There is a distinct possibility I’ll get lots of anecdotal evidence in the comments on this post. I would definitely prefer fact, if there is any to get.


39 Comments, Comment or Ping

  1. Seo Sanghyeon

    Codes written in dynamic languages tend to be shorter than codes written in static languages doing the same thing, and I think code size is the most important factor in maintenance.

    It may be true that one million lines Python codebase is harder to maintain than one million lines Java codebase — but you don’t need one million lines of Python, so that’s irrelevant.

    October 14th, 2008

  2. Paul K

    The lack of type certainly is a problem, and we can tell by the amount of time a newbie spends learning an API. In java learning a new API is moderately easy, the types make sense and it all fits together obviously. However in Ruby I have found that I spend more time looking at the code to determine what sorts of things I can do with the return, whether I have a set or some pseudo hash of properties. That never goes away and after two years I know more what to expect but lack of knowledge is always there without looking directly into the code and following the call chain to find something more concrete (or just running a test to determine the type).

    The general feeling of not knowing until you run it also is made worse when it comes to maintaining that API. Changing the type of return produces only runtime errors and I have been caught by this numerous times due to lack of 100% code coverage by the unit tests.

    Many of the libraries now out there support a dizzying array of possible inputs which aren’t at all obvious from the method signature, you have to read the docs. You don’t have to do that with a well designed Java function, but a “well designed” ruby one requires it. It might do more but there is more to learn. If everyone wrote programs in the same style as rails etc then the maintenance problem is obvious. Of course as mentioned there is normally less to maintain, but really lack of types puts a very real limit on how large a dynamic language program can be before reading code becomes a big activity. Of course that happens in static languages too, and I would counter argue myself with the fact that most methods are badly designed and you need to read the code regardless anyway!

    October 14th, 2008

  3. I think there are several issues shared by both with respect to maintenance – team quality, coherence, reuse, documentation. The two major differences I’ve noticed are 1) obvious: code size 2) over use of meta. While (1) is in favor of dynamic languages, (2) is not. But thankfully (2) is not as common yet and (1) will always hold true.

    October 14th, 2008

  4. Satya Prakash

    The story is a bit more complicated than that. Although Java is a static language(meaning type inferencing at compile time) it is a not a very good example of that.

    Static type inferencing does aid tooling quite a lot. It is much easier to come up with a good IDE for a static language. Hence this is one up for static languages.

    Static languages need not be as verbose as Java is to aid the tooling. If Python code is terse, so is Haskell/OCaml/Scala.

    IMO what decides the maintenance cost is the ecosystem of the language and not the dynamic/static nature. Since Java has a rich such ecosystem people feel more comfortable maintaining code in Java.

    That said good static languages like Haskell/Scala/Ocaml need a better ecosystem. Scala inherits the Java ecosystem, but needs better IDE support. Haskell and OCaml have started well on the library front with their ‘batteries included’ efforts. But a good IDE will push both of them into the super league.MS F# is very similar to OCaml, so soon it will have a great IDE in Visual Studio.

    October 14th, 2008

  5. I work on code that is nearly 20 years old; it is statically typed in so far as it is written in C, but dynamically typed in so far as discriminated unions give rise to dynamic typing behaviour (it’s a compiler). Oh, and all memory comes from pools, so no futzing with malloc/free pairing.

    When you have a language like Python, or Ruby, and you can be reasonably – not certain, but reasonably – sure that instances correspond roughly to the shape of the class definitions (i.e. not too much monkey-patching), examining the class definitions gets maybe a good 40% of the way to the information good static typing can deliver. If the classes are very well documented, you can get another 40%, in the best case, perhaps.

    The remaining 20% bothers me some, but more for certain scenarios than others. I habitually lean on the compiler for certain things. I change an identifier in a declaration, and I rely on the compiler to tell me about all the use points.

    However, if the language I have to use is as poverty-stricken as Java or C, I’ll get a productivity bonus simply from taking advantage of more dynamic and more expressive abstractions, that will outweigh the 20% downside. On the other hand, nothing I can do easily will get the C performance back, and compile-time speed is a major feature for our product (Delphi).

    However, if we go into something like your suggested Ioke, where new objects are cloned from old objects, there’s a lot less to go on for getting a picture of how everything fits together. That bothers me, and it was probably the main thing that prompted me to comment on your earlier post.

    October 14th, 2008

  6. I think Cederic Beust have summarized my concerns a long time ago, here:

    /weblog/archives/000414.html

    Mats

    October 14th, 2008

  7. dirty

    Whilst I agree with your main points, the world just ain’t that nice. All code is *not* written by TW-grade devs – it seems to be mostly written by inbred monkeys using ancient latin keyboards!

    From my perspective, the issue is one of damage limitation – which is a depressing reason by anyone’s standards – the current tooling support for dynamic languages just isn’t capable of unraveling the nonsense code that makes up 99% of all software.

    Static languages can still get you into trouble, of course – just recently I was told of a banking application which reflectively instantiates objects by looking their class names up in a database table, based upon a key which was passed to the method in an ungenerified map with over 80 entries in it.

    Still don’t think that anyone could f*ck up a Ruby project *that* badly? In about 3-5 years, we’ll know for sure…. :)

    October 14th, 2008

  8. Gabe

    The thing that makes dynamic languages harder to maintain in general is cleverness. This isn’t really a dynamic vs static thing so much as a java vs ruby thing. Java just makes it hard to be too clever for your own good. With dynamic languages the temptation to over engineer or do really obtuse abstractions is always there because it’s so easy.

    One of the things I don’t hear a lot about is the tension between elegance and maintenance. With a language like ruby it’s possible to come up with extremely elegant solutions to certain problems. The problem is that elegant does not correlate to flexible or maintainable. In theory a more powerful language will always be more flexible, so it sounds ridiculous to say that a ruby solution to a problem could be less flexible than a java solution. However in practice a verbose java solution may be easier to refactor (especially with tool support) than a dynamic solution that is full of subtle assumptions that are not apparent from looking at the code or even grokking it.

    I still am a firm believer and would never willingly use java where a more powerful language was available, but I think proponents of dynamic languages need to recognize the dangers and realize that there really is a tangible maintenance benefit to keeping programs a little dumber and a little more verbose in many cases.

    October 14th, 2008

  9. Tommy

    I have to agree with Seo — you save so much code that a lack of types doesn’t matter if you are even halfway competent.

    But you are overlooking a few things here: Lisp has been around forever, but it has generally been used only by “smart” people, so there are less stupid bugs. The same goes for Smalltalk.

    October 14th, 2008

  10. Mats:

    Yeah, I considered touching the refactoring issue, but decided not too. David Rice (one of the main smart guys behind Mingle, which is a quite large Rails application. David has worked a lot in Java and C#.) told me earlier today that he specifically have NEVER felt the need for refactoring when working on the Mingle code base. I found that interesting to hear.

    October 14th, 2008

  11. “The Maintenance myth”

    [snip snip snip]

    “Has there been any research done in this area?”

    Nice blog post, if you cut out the middle. Interesting calling something a myth and then asking about research in the end.

    “(very large code bases is 50K-100K lines of code in these languages).”

    100K is very large? I wrote some projects in a two person team and reached 50K of lines. This is rather small. We did 50K Python programs in the 90s in a small development shop ( more thought points = more complexity & more effort).

    @Seo: “Codes written in dynamic languages tend to be shorter than codes written in static languages doing the same thing, and I think code size is the most important factor in maintenance.”

    I don’t think Scala is much larger in LOC than Ruby.

    Peace
    -stephan

    October 14th, 2008

  12. Ruby/Java is probably the most awful way to have the “static-vs-dynamic conversation”. You have a dynamically-typed language that is quite lacking and a statically-typed language that is significantly lacking in almost all areas, particularly in the type system. What would such a conversation bring to fruition? Only amateurish nonsense – it’s not worth it.

    If you are a language geek, why don’t you learn a high-level statically-typed language? Haskell or even Agda for example. In particular, the many type system features.

    Then have the discussion on those well-informed grounds. I expect it would be much more interesting. Just a suggestion.

    October 14th, 2008

  13. Refactoring tools, further Rebecca’s message, seem to be more advanced.

    In this specific area, do you think something that may be limiting some dynamic languages is the lack of a (exhaustive) formal spec? From a position of mostly ignorance, I would expect it would make real-time AST use in IDEs a bigger challenge than with some statically typed languages.

    October 14th, 2008

  14. @Dirty:

    The common misconception that static typing is “safe” and that compilers will detect issues is really the danger. You rightly point out reflection and second-class interfaces. Casting is another.

    Something I see that destroys maintainability in class-oriented systems is the lovely String. There are few domains where a String instance should be used as a collaborator – hardware inventory and footwear retail are two I can think of ;-)

    October 14th, 2008

  15. Stephan:
    I would consider 50k-100k in Ruby to be very large, yes, definitely. I know of Python code bases between 100k and 200k, but that’s about the largest I’ve heard of. You obviously have more experience with _really_ large code bases than me.

    I call it a myth because most people seem to argue as if it’s fact. So taking the stance that it’s a myth provides a starting point for my discussion.

    October 14th, 2008

  16. Tony:
    Notice in the beginning, where I base this blog post in the fact that this discussion happens very often in the intersection of Java and Ruby communities? That’s why I’m arguing from those languages. And the main reason Java stands in for static typing is because it’s one of most common languages in the static camp.

    Suffice to say, I’m enough of a language geek to have read some type theory, I have experience with both OCaML, Haskell and Scala. That’s not the point. I’m not interested in discussing and comparing maintenance between Haskell and Lisp, or any other pairing of interesting languages. In this context it’s totally irrelevant.

    October 14th, 2008

  17. I’ve been looking into this issue quite a bit lately from the perspective of someone tasked with learning a very large set of Python projects. I have found that the less code the easier it is to troll through the code finding an answer, although, in many cases the path leads to dead ends more often. Cleverness can be an issue, but it is not the only issue. In the case of Python, tools such as list comprehensions and the itertools module provide not only more concise code, but actual performance boosts. Since the code I’ve been learning is rather large and deals with a large amount of data, performance tweaks have been performed, which results in some required obfuscation.

    Also, in my case, I’m involved in learning a set DSLs. This not only means understanding the actual functionality along side the compilier that was built to support the language. This can get very hairy and is difficult to follow. This might be the case no matter what the language is though.

    I don’t believe the “good developer” argument is valid either. A good developer is always compared somewhat to their previous domain. Also, understanding code is not a one way street. You are reading the code someone else wrote, so it is not necessarily the case that your misunderstanding is entirely your fault. Commenting, design documents, tests and documentation all play a factor and yet are seen as pointless by many “good” developers, so it is far from cut and dry.

    I think one of the best ways to learn a dynamic language environment is to perform as many code reviews as possible and consider a refactoring time for a new developer. Reviewing the new code should bring up issues with the old code, at which point things can be refactored to make things easier for the next person. This might seem like overkill in terms of reducing progress of a team, but it does get the new developer up to speed quickly as well as allowing a time to document obvious problem areas that would otherwise be hidden through the view of a single developer.

    October 14th, 2008

  18. Talk to Smalltalk veterans about maintainability.

    The ones I’ve spoken to point out that while is is possible to write great code in a dynamically typed language, it is also possible to go very far without getting the sort of feedback about bad design that we get in static languages — things like insane build time and difficult change characteristics. Apparently, you can do good work but you have to be very alert.

    The other pain point that they talk about is the hand-off to other developers (particularly less able ones). It’s easy to go meta and write guru level reflection-charged code that is completely opaque to newcomers. When they encounter code like that, they patch on top and find alternative paths, muddying things further. The Rails community seems to get getting a bit of this now, but it did happen before in the Smalltalk community.

    October 14th, 2008

  19. Hi Ola,
    I understand that you may choose “what is popular”, but you have also chosen “what misrepresents static typing to an extreme magnitude”. Your discussion is not about static typing at all, but about Java’s incredible failure at representing such a notion.

    There are lots of claims that such a discussion occurs at the intersection of the Java and Ruby community, but this doesn’t make it true. In fact, it is false. No such discussion occurs. The title is something different. Perhaps if the proponents of such silliness were more enlightened it would be something like, “amateur type theorists using bad examples to support a bad argument that derives nothing useful” (not intending to be cynical, but making the point with exaggeration).

    I only suggested Haskell et. al. to help make it clear that you are not at all having a “static-vs-dynamic conversation” and neither are others who claim to be. If you are aware of this fact, then great (can we be a bit more honest then?); if not, then I press the issue.

    October 14th, 2008

  20. I think we should definitely consider timing too — for instance Ruby has really only hit the mainstream via Rails over the last 24-36 months. The maintenance headaches could still be to come, when the people who originally wrote the code are long gone. Right now that may be a limited subset of projects, but it will only increase — then we may hear more about it when maintainers are looking over others code.

    October 15th, 2008

  21. Seo Sanghyeon

    Some statistics on Python codebases which I consider large:

    Over 100K
    Twisted event-driven networking: 195K
    Freevo media center: 165K
    SCons build tool: 136K
    Gramps genealogy: 128K

    Over 50K
    Django web framework: 78K
    Flumotion streaming server: 60K
    SpamBayes spam filter: 55K
    PyMOL molecular viewer: 53K

    October 15th, 2008

  22. Chekke

    Hola Ola,

    Maybe you can consider Clojure, Im still experimenting with it and the LOC with it is pretty Low maybe because is a Lisp and dynamic. Really awesome by the way and have very nice concurrency features.

    October 15th, 2008

  23. Actually, you question the “hard to maintain” myth, but at the same time you’re repeating the “less code in dynamic languages” myth. Don’t get me wrong, I sure hate the bloated Javanese you’ll find in many places just like you.

    But I have the feeling that this is simply a matter of a bad infrastructure. What you seriously _need_ in Java is variable declarations, type annotations, and more code due to the lack of closures. Everything else is probably just an artifact of libraries.

    Of course you can save code due to meta programming, but I’d argue that some relatively static meta programming can be done in Java with ASM, AOP, and reflection, and what goes beyond that might not be such a great idea anyways, in particular in the light of maintenance (too clever).

    I think the Java world is actually not such a bad place. If Java was a bit more like C# 3.0, it might actually be really nice. What’s broken is IMHO mostly the infrastructure and partially the libraries, which is ten years old, and shows its age, and this leads to bloated code in many places.

    October 15th, 2008

  24. matthias

    It’s not like Ruby is on one side and Java and Haskell are on the other side. The difference between Java and Haskell is at least as big as the difference between Ruby and Java.
    So you could equally well ask the question: How can you maintain code in a dynamic language like Java without the typesafety of Haskell?

    October 15th, 2008

  25. Dynamic *techniques* (some of which are relatively independent of language) can trade code size for flexibility, but either way, all the underlying complexity is still there. When you get too much complexity in one spot, or too much crosscutting code, your brain explodes and you have maintenance problems. The more dynamic languages provide more dynamic techniques (like $$foo or extract() in PHP), but they’re subject to the same laws as more static code.

    In other words, the upper bound on maintainability is defined by your requirements, and the particular choice of language and techniques only reduce maintainability further. Good practice involves wisely choosing the path that reduces maintainability the least. As far as I can tell, anyway.

    October 15th, 2008

  26. Actually CDBaby was written in PHP, spent two years on a rewrite to Ruby that was never deployed, and reverted to an updated PHP version:

    October 15th, 2008

  27. John Shea

    I believe that Ola is right.

    I believe that, because it is my experience having moved from Java to ruby that my code now requires less maintenance.

    I am willing to admit that my beliefs lack scientific rigour.

    However unlike many of those who responded (and responded to similar posts), I am not willing to use Intelligent Design like assertions to insist on some sort of tribal belief that “my is good because of , and please look at these links here .”

    I’d like to see some old fashioned scientific method applied here, and it occurs to me Ola that you are in the right position to do it.

    Send out a survey to many of your co workers asking them various questions on static vs dynamic (and perhaps don’t restrict it to that bogeyman “type” – but to other dynamic aspects like redefining classes, method calling etc). It might not give us “static blah blah is 30% more efficient than dynamic blah blee” – but it will perhaps give an industry consensus of productivity (and other characteristics) differences between static and dynamic.

    You could even publish.

    October 15th, 2008

  28. John Shea

    previous post didn’t like the angle brackets – so here it is again:

    I believe that Ola is right.
    I believe that, because it is my experience having moved from Java to ruby that my code now requires less maintenance.

    I am willing to admit that my beliefs lack scientific rigour.

    However unlike many of those who responded I am not willing to use Intelligent Design like assertions to insist on some sort of tribal belief that “my [insert language here] is good because of [insert religious mantra here], and please look at these links here [<]where the gods have ordained that x language theoretically has y characteristics that are slightly germane to the discussion].”

    I’d like to see some old fashioned scientific method applied here, and it occurs to me Ola that you are in the right position to do it.

    Send out a survey to many of your co workers asking them various questions on static vs dynamic (and perhaps don’t restrict it to that bogeyman “type” – but to other dynamic aspects like redefining classes, method calling etc). It might not give us “static blah blah is 30% more efficient than dynamic blah blee” – but it will perhaps give an industry consensus of productivity (and other characteristics) differences between static and dynamic.

    You could even publish.

    October 15th, 2008

  29. Josh Weissman

    It’s simple… You trade flexibility for speed and stability. I think the golden area lies in between typed and untyped languages.

    ANSI Common Lisp has the ability to specify types, although does not *REQUIRE* that you do so. When you include type information, your code can even approach the speed of equivalent C.

    Google “How to make Lisp go faster than C” or go grab the paper here:

    /~didier/research/verna.06.imecs.pdf

    I think the problem is the “either or” mentality. Types are a tool that can (and should) be added at the programmers discretion.

    Prototype without them, and when the design gets figured out, lock it down with types to gain the stability.

    The “All Or Nothing” approach is, in my opinion, nonsense.

    October 15th, 2008

  30. Isaac Gouy

    > If you make the assumption that the team is good and the code is treated well from day 0

    A very large Smalltalk application was developed at Cargill to support the operation of grain elevators and the associated commodity trading activities. The Smalltalk client application has 385 windows and over 5,000 classes. About 2,000 classes in this application interacted with an early (circa 1993) data access framework. The framework dynamically performed a mapping of object attributes to data table columns.

    Analysis showed that although dynamic look up consumed 40% of the client execution time, it was unnecessary.

    A new data layer interface was developed that required the business class to provide the object attribute to column mapping in an explicitly coded method. Testing showed that this interface was orders of magnitude faster. The issue was how to change the 2,100 business class users of the data layer.

    A large application under development cannot freeze code while a transformation of an interface is constructed and tested. We had to
    construct and test the transformations in a parallel branch of the code
    repository from the main development stream. When the transformation
    was fully tested, then it was applied to the main code stream in a single
    operation.

    Less than 35 bugs were found in the 17,100 changes. All of the bugs
    were quickly resolved in a three-week period.

    If the changes were done manually we estimate that it would have taken
    8,500 hours, compared with 235 hours to develop the transformation
    rules.

    The task was completed in 3% of the expected time by using Rewrite
    Rules. This is an improvement by a factor of 36.

    from “Transformation of an application data layer” Will Loew-Blosser OOPSLA 2002

    Tooling matters. Tooling matters a lot.

    October 18th, 2008

  31. Isaac Gouy

    Michael Feathers wrote:

    > The other pain point that they talk about is the hand-off to other developers (particularly less able ones). It’s easy to go meta and write guru level reflection-charged code that is completely opaque to newcomers.

    In the same way that we might ask if code is testable, surely we should ask if it is maintainable?

    We may be clever enough to write reflection-charged code, but are we clever enough to acknowledge that doing so would break traceability in the development tools and clever enough to write some boring vanilla code when our guru level code really wasn’t necessary.

    For maintainability the trick is being clever enough to keep things simple.

    October 22nd, 2008

  32. You make a very interesting point. Looking back I just realised that for the projects I’ve been involved with, dynamic (Ruby, Rails etc) projects or parts of projects have *always* been easier to maintain, as compared with the static parts.

    This is purely personal statistics though – but your post made me realise it.

    October 29th, 2008

  33. I find that it is quite difficult to get hard data regarding productivity, time estimates, maintenance cost, etc. with respect to software.

    I guess this is primarily due to the fact that It is really difficult to measure these and you need to do it across many teams/platforms/domain/… (depending on what exactly is the variable that you want to measure). Indeed CodeComplete-II provides such numbers but most of the research that is cited there is from mid-90s (if not earlier) which means that you cannot 100% count on it since many things have changed since then (I am not saying that the results are completely irrelevant. It’s just that you’ll always have doubts regarding their applicability today).

    November 13th, 2008

  1. Why Ruby? | Logical Decay - March 13, 2009

Reply to “The Maintenance myth”