Yesterday’s post ranking the “expressiveness” of programming languages was quite popular. It got more than 30,000 readers in the first 24 hours; it’s at 31,302 as I write this. For this blog, that qualifies as a great audience. After a day’s worth of feedback, thought, and discussion on Twitter, Hacker News, and the post’s comments, I wanted to sum up some of my thoughts, others’ contributions, and things I left out of the initial post.
What are we really measuring here?
As I mentioned as a major caveat in the initial post, lines of code (LOC) per commit is an imperfect metric as a window into expressiveness. It’s measuring something, but what does it mean? My take on these results is that it’s a useful metric when painting with broad strokes, and the results seem to generally bear that out. It’s more helpful in comparing large-scale trends than arguing over whether Ruby should be #27 or #22, which is likely below the noise level. I think the reason some placements seem so weird is that it’s measuring expressiveness in practice rather than in theory. That brings in factors like:
- The standard library and library ecosystem. Is there a weak standard library? Is there a small or nonexistent community of add-on library developers? In both cases, constructing a commit-worthy chunk of code could require additional lines.
- The developer population using it. Especially for third-tier languages, the number of developers is small enough that these results could reflect those developers more than the properties of the language itself. Some of the least-popular third-tier languages have fewer than 10 developers committing during a given month. I would generally disregard anything but the largest differences between third-tier languages, and treat even those with skepticism. Some languages are also more popular for beginning programmers, which could influence the results if the beginners make up a significant chunk of the language’s total userbase.
- Differences in project types by language. If a language is more likely to be used in larger, enterprise projects, this could influence the types of commits it receives. For example, it could get more small bugfixes than new features because it’s a long-lived codebase and requires additional stability. It could also see a different level of refactoring.
So … what should you get out of the results, then?
Frankly, given all the possible variables involved, the biggest surprise here is that the results look as reasonable as they do, at the level of broad, multi-language or cross-tier trends. Here’s what I would tend to believe, and what I would be skeptical about.
- Believe: multi-language trends
- Believe: cross-tier trends
- Believe: large differences between individual languages, but investigate why
- Believe: highly-ranked languages
- Be skeptical: anything involving third-tier languages
- Be skeptical: small differences between individual languages
- Be skeptical: individual languages that don’t fit into a group of similar ones
- Be skeptical: low-ranked languages, until investigated
Why do I suggest believing high ranks but not low ones? It’s the Anna Karenina principle, as Tolstoy wrote:
Happy families are all alike; every unhappy family is unhappy in its own way.
While there are a large number of ways to have a high median or high IQR, it seems to me that low values of both would indicate a number of good development practices in addition to a good language.
To wrap things up, I think this is measuring, with a fair amount of noise, a form of expressiveness in practice rather than in theory — a form that includes all the ways code is incorporated into a repository. That makes it an interesting window into a number of potential problems with how specific languages as well as language classes are typically used.