{"id":1708,"date":"2013-03-26T13:48:37","date_gmt":"2013-03-26T18:48:37","guid":{"rendered":"http:\/\/redmonk.com\/dberkholz\/?p=1708"},"modified":"2013-03-26T21:28:35","modified_gmt":"2013-03-27T02:28:35","slug":"what-does-expressiveness-via-loc-per-commit-measure-in-practice","status":"publish","type":"post","link":"https:\/\/redmonk.com\/dberkholz\/2013\/03\/26\/what-does-expressiveness-via-loc-per-commit-measure-in-practice\/","title":{"rendered":"What does &#8220;expressiveness&#8221; via LOC per commit measure in practice?"},"content":{"rendered":"<p>Yesterday&#8217;s post ranking the <a href=\"http:\/\/redmonk.com\/dberkholz\/2013\/03\/25\/programming-languages-ranked-by-expressiveness\/\">&#8220;expressiveness&#8221; of programming languages<\/a> was quite popular. It got more than 30,000 readers in the first 24 hours; it&#8217;s at 31,302 as I write this. For this blog, that qualifies as a great audience. After a day&#8217;s worth of feedback, thought, and discussion on <a href=\"https:\/\/twitter.com\/search\/realtime?q=http%3A%2F%2Fredmonk.com%2Fdberkholz%2F2013%2F03%2F25%2Fprogramming-languages-ranked-by-expressiveness%2F&amp;src=typd\">Twitter<\/a>, <a href=\"https:\/\/news.ycombinator.com\/item?id=5438755\">Hacker News<\/a>, and the <a href=\"http:\/\/redmonk.com\/dberkholz\/2013\/03\/25\/programming-languages-ranked-by-expressiveness\/\">post&#8217;s comments<\/a>, I wanted to sum up some of my thoughts, others&#8217; contributions, and things I left out of the initial post.<\/p>\n<h2>\u00a0What are we really measuring here?<\/h2>\n<p>As I mentioned as a major caveat in the initial post, lines of code (LOC) per commit is an imperfect metric as a window into expressiveness. It&#8217;s measuring <strong>something<\/strong>, but what does it mean? My take on these results is that <strong>it&#8217;s a useful metric when painting with broad strokes, and the results seem to generally bear that out<\/strong>. It&#8217;s more helpful in comparing large-scale trends than arguing over whether Ruby should be #27 or #22, which is likely below the noise level. I think the reason some placements seem so weird is that <strong>it&#8217;s measuring expressiveness in practice rather than in theory<\/strong>. That brings in factors like:<\/p>\n<ul>\n<li><strong>The standard library and library ecosystem.<\/strong> Is there a weak standard library? Is there a small or nonexistent community of add-on library developers? In both cases, constructing a commit-worthy chunk of code could require additional lines.<\/li>\n<li><strong>The development culture and its norms.<\/strong> Is copy-and-pasting common for this language? Are imported libraries often committed to the project repository (JavaScript is a prime candidate here)? Are autogenerated files committed (e.g., minified JavaScript, autotools configure scripts)?<\/li>\n<li><strong>The developer population using it.<\/strong> Especially for <strong>third-tier languages<\/strong>, the number of developers is small enough that these results could reflect those developers more than the properties of the language itself. Some of the least-popular third-tier languages have fewer than 10 developers committing during a given month. I would generally disregard anything but the largest differences between third-tier languages, and treat even those with skepticism. Some languages are also more popular for <strong>beginning programmers<\/strong>, which could influence the results if the beginners make up a significant chunk of the language&#8217;s total userbase.<\/li>\n<li><strong>The time frame of its initial popularity.<\/strong> \u00a0This can result in time-based influences upon tools and methodologies in use. For example, newer languages popularized in the <strong>agile<\/strong> and <strong>GitHub<\/strong> eras may tend to bias toward smaller, more frequent commits. Languages that grew up alongside <strong>waterfall<\/strong> development and slower, <strong>centralized version control<\/strong> may be biased more toward larger, monolithic commits. It even carries as far as things like <strong>line length<\/strong> \u2014 today, wide-screen monitors are common, and many developers no longer restrict their column width to 80 or less. This could have a language-specific impact, where older languages with a great deal of inertia change more slowly to a new &#8220;standard&#8221; of development.\u00a0For example, perhaps fixed-format Fortran wasn&#8217;t typically maintained in version control at all, and full files were just committed wholesale? That could explain its similarity to JavaScript.<\/li>\n<li><strong>Differences in project types by language.<\/strong> If a language is more likely to be used in <strong>larger,<\/strong>\u00a0<strong>enterprise<\/strong> projects, this could influence the types of commits it receives. For example, it could get more small bugfixes than new features because it&#8217;s a long-lived codebase and requires additional stability. It could also see a different level of refactoring.<\/li>\n<\/ul>\n<h2>So &#8230; what should you get out of the results, then?<\/h2>\n<p>Frankly, given all the possible variables involved, <strong>the biggest surprise here is that the results look as reasonable as they do<\/strong>, at the level of broad, multi-language or cross-tier trends. Here&#8217;s what I would tend to believe, and what I would be skeptical about.<\/p>\n<ul>\n<li><strong>Believe<\/strong>: multi-language trends<\/li>\n<li><strong>Believe<\/strong>: cross-tier trends<\/li>\n<li><strong>Believe<\/strong>: large differences between individual languages, but <strong>investigate<\/strong>\u00a0why<\/li>\n<li><strong>Believe<\/strong>: highly-ranked languages<\/li>\n<li><strong>Be skeptical<\/strong>: anything involving third-tier languages<\/li>\n<li><strong>Be skeptical<\/strong>: small differences between individual languages<\/li>\n<li><strong>Be skeptical<\/strong>: individual languages that don&#8217;t fit into a group of similar ones<\/li>\n<li><strong>Be skeptical<\/strong>: low-ranked languages, until <strong>investigated<\/strong><\/li>\n<\/ul>\n<p>Why do I suggest believing high ranks but not low ones? It&#8217;s the Anna Karenina principle, as Tolstoy wrote:<\/p>\n<blockquote><p><i>Happy families are all alike; every unhappy family is unhappy in its own way.<\/i><\/p><\/blockquote>\n<p><strong>While there are a large number of ways to have a high median or high IQR, it seems to me that low values of both would indicate a number of good development practices in addition to a good language.<\/strong><\/p>\n<p>To wrap things up, I think this is measuring, with a fair amount of noise, a form of expressiveness in practice rather than in theory \u2014 a form that includes all the ways code is incorporated into a repository. That makes it an interesting window into a number of potential problems with how specific languages as well as language classes are typically used.<\/p>\n<div class=\"acc_license\"><a href=\"http:\/\/creativecommons.org\/licenses\/by-sa\/3.0\/\"><img decoding=\"async\" src=\"http:\/\/i.creativecommons.org\/l\/by-sa\/3.0\/88x31.png\" alt=\"by-sa\" \/><\/a><\/div><!--<rdf:RDF xmlns=\"http:\/\/creativecommons.org\/ns#\" xmlns:dc=\"http:\/\/purl.org\/dc\/elements\/1.1\/\" xmlns:rdf=\"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#\"><Work rdf:about=\"\"><license rdf:resource=\"http:\/\/creativecommons.org\/licenses\/by-sa\/3.0\/\" \/><\/Work><License rdf:about=\"http:\/\/creativecommons.org\/licenses\/by-sa\/3.0\/\"><requires rdf:resource=\"http:\/\/creativecommons.org\/ns#Attribution\" \/><permits rdf:resource=\"http:\/\/creativecommons.org\/ns#Reproduction\" \/><permits rdf:resource=\"http:\/\/creativecommons.org\/ns#Distribution\" \/><permits rdf:resource=\"http:\/\/creativecommons.org\/ns#DerivativeWorks\" \/><requires rdf:resource=\"http:\/\/creativecommons.org\/ns#ShareAlike\" \/><requires rdf:resource=\"http:\/\/creativecommons.org\/ns#Notice\" \/><\/License><\/rdf:RDF>-->","protected":false},"excerpt":{"rendered":"<p>Yesterday&#8217;s post ranking the &#8220;expressiveness&#8221; of programming languages was quite popular. It got more than 30,000 readers in the first 24 hours; it&#8217;s at 31,302 as I write this. For this blog, that qualifies as a great audience. After a day&#8217;s worth of feedback, thought, and discussion on Twitter, Hacker News, and the post&#8217;s comments,<\/p>\n","protected":false},"author":6,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false},"categories":[3,7,21],"tags":[],"class_list":["post-1708","post","type-post","status-publish","format-standard","hentry","category-adoption","category-data-science","category-employment"],"jetpack_featured_media_url":"","jetpack_publicize_connections":[],"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p23Tsn-ry","_links":{"self":[{"href":"https:\/\/redmonk.com\/dberkholz\/wp-json\/wp\/v2\/posts\/1708","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/redmonk.com\/dberkholz\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/redmonk.com\/dberkholz\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/redmonk.com\/dberkholz\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/redmonk.com\/dberkholz\/wp-json\/wp\/v2\/comments?post=1708"}],"version-history":[{"count":0,"href":"https:\/\/redmonk.com\/dberkholz\/wp-json\/wp\/v2\/posts\/1708\/revisions"}],"wp:attachment":[{"href":"https:\/\/redmonk.com\/dberkholz\/wp-json\/wp\/v2\/media?parent=1708"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/redmonk.com\/dberkholz\/wp-json\/wp\/v2\/categories?post=1708"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/redmonk.com\/dberkholz\/wp-json\/wp\/v2\/tags?post=1708"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}