Alt + E S V

What’s Going On With Language Rankings?

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

As the inbound DMs and emails attest, at least a few of you out there having been waiting for RedMonk to drop our latest version of the programming language rankings. Our apologies about the delay: here’s our update on the process.

Normally we run these rankings twice a year, with the goal of using publicly available data sets to track if there are any meaningful changes in how people are using programming languages. We correlate cumulative language usage as seen through non-forked PRs on public GitHub repos against questions asked on Stack Overflow.

These metrics have always told an incomplete story at best. There are languages that were under- or over-represented by using public GitHub repos; there are communities that were under- or over-represented by looking at discussions happening in Stack Overflow. These metrics were not perfect by any means, but the two large, publicly facing datasets created an interesting decade plus long trend for RedMonk to track over time.

Our last set of queries, however, provided us with data that required further investigation.

Stack Overflow

We’ve been internally discussing how we’re going to address the impact of AI-based code assistants on our language rankings since GitHub released Copilot in October 2021. However, it was when ChatGPT hit the market on November 30, 2022 and went from 0 to 100M users in two months that we started seeing undeniable impacts on our source data.

We’d already seen questions asked falling off from their peak. However, when we pulled data for our June 2023 rankings the collective decline was notable.

The below chart takes questions asked on Stack Overflow about the top 20 programming languages (as determined from RedMonk’s analysis dated January 2023) and look backed at historical semi-annual trends.

Bar chart showing number of questions tagged on Stack Overflow for our top 20 languages. Questions peak in 2016-17 and begin to fall sharply in last year

As you can see, the number of questions asked using these 20 Stack Overflow tags* declined almost 20% from the prior period. And this is just our first full period running these numbers post-ChatGPT. A cursory query about YTD indicates an even more stark change.

It is undeniable that developer’s ability to instantaneously ask questions of a non-judgmental AI assistant that will give answers in context is going to have a marked negative impact on the usefulness of the public datasets provided by Stack Overflow.

GitHub

While we expected a change in Stack Overflow, we were not expecting significant anomalies in our data from GitHub (and to the extent that we were anticipating changes, we expected them to be upwards. The narrative about AI code assistants aiding development velocity and benefitting tinkerers has been prevalent.)

However, the data we saw from GitHub Archive actually showed a roughly 25% decline in pull requests in 1H2023 as compared to 2H2022 PRs that we were not expecting.

The dataset we use is a public dataset on BigQuery, and so we asked questions of both Google in terms of how the data was pulled in and of the GitHub team to see if they had seen similar changes on in their internal data.

In the end, the change appears to not be an error in the available data and is largely lacking an explanation. The best guess thus far is that there was an overhang in increased activity from the pandemic and this is a return to expected activity, but we have no way of confirming whether that storyline is accurate.

As of now this is not a declining trend we expect to continue, but these are numbers we will continue to watch with interest.

What Next?

The advent and rise of AI-based code assistants are already impacting the data that populates RedMonk’s language rankings. As questions and knowledge sharing moves from public forums to private tools, our ability to ascertain meaningful trends from said public data will be indefinitely altered.

We will continue to track these trends and make determinations about how this change in sample size will impact our ability to perform the rankings.

As of now, look for our next rankings in January 2024.


* Tags queried based on last Top 20: JavaScript, Python, Java, PHP, C#, CSS, TypeScript, C++, Ruby, C, Swift, Shell, R, Go, Scala, Objective-C, Kotlin, PowerShell, Rust, Dart

Disclaimer: GitHub and Google Cloud are RedMonk clients. Stack Overflow and OpenAI (ChatGPT) are not.

4 comments

  1. Have you factored in the devs and teams that have been moving away from git and GitHub, especially now that they’ve realized that Fossil SCM exists?

    Fossil SCM has been giving a lot of teams a much better experience, and that’s why it’s gaining traction. It’s what git should have been all along.

  2. Who is using stack still these days ?.
    First users get blamed that their questions are wrong.
    Other users cannot ask question which where normal to ask 5 years ago.
    You had a good question no luck either.
    You get downvoted cause someone doesn’t like you or doesn’t understand your question.
    Its a terrible site for finding people who had similar questions since its not a open forum.

    I’m really glad these days we can use LLM’s like chat gpt instead.
    Sad though it could have been a great platform if it was not run by autistics admins.

  3. How is it you can attribute this to “the advent and rise of AI-based code assistants”? Your own chart shows that questions for the top-20 languages have been falling for 6 or 7 years. ChatGPT is barely a year old. At most, it would affect only the last 2 bars in your chart.

    What I see is evidence (it’s not just me!) that StackOverflow has been going downhill as a place to get useful answers for several years. Or maybe all the obvious questions for popular languages have already been asked. Or maybe moderators have been stricter about closing down questions. Or maybe programmers (like me!) have shifted back to non-“top-20” languages. There are plenty of ways to explain these changes that have nothing to do with generative AI.

    For that matter, increased use of generative AI for programming would seem to me to cause an *increase* in the number of PRs, not a decrease. How do you explain this discrepancy with your hypothesis?

  4. This is very interesting research. I wonder if this observation will extend to the greater social (tech) environment. That is, are humans beginning to communicate with each other less and interacting more with AI?

Leave a Reply

Your email address will not be published. Required fields are marked *