In the afterglow of GitHub Universe 2024, I’m revisiting the “10 Things Developers Want from AI Code Assistants” post I authored last year because in the past 12 months THINGS HAVE CHANGED. Sure, the technology has evolved and improved, but more importantly, developers are fundamentally shifting their workflows to accommodate these tools. Although the data on code impact is mixed, the 2024 DORA Report indicates that AI code assistants have become a common element of developer tooling. 76% of respondents report relying on AI for tasks like code writing, summarizing information, and code explanation, and 67% of respondents report that AI is helping them improve their code.
There is also a ton more competition in the marketplace. In addition to the options I listed last year of GitHub Copilot, Sourcegraph’s Cody, Amazon CodeWhisperer (now Amazon Q Developer), CodiumAI, IBM’s watsonx Code Assistant, Tabnine, MutableAI, AskCodi, Codiga, and Replit AI, I can now add Aider, Augment Code, Cline, CodeComplete, CodeGeeX, CodeGPT, Codiga, OpenAI Codex, Continue.dev, Cursor, Snyk’s DeepCode AI, Cognition’s Devin, Google’s Gemini Code Assist, Microsoft IntelliCode, JetBrains AI Assistant, Refact.ai, Sourcery, SQLAI, Qodo Gen (formerly Codiumate), and Void (phew). There is also a new breed of agents to accomplish specific tasks in the SDLC such as testing and QA (Copilot Autofix, Graphite Reviewer), as well as agents for bootstrapping entirely new apps like, Replit Agent, Stackblitz Bolt, GitHub Spark, and Vercel v0. These additions (which I do not claim to be encompassing) to my list are not necessarily new since last year, but rather they have appeared on my radar since then. Indeed, staying on top of the AI code assistant offerings and features is a full time job. Relatedly, if you’re reading this, Forrest Brazeal, can I get this list put to the Major-General’s Song, kthx?
Here is some of my evidence that developer sentiment around AI code assistants is evolving. First, when I attended DevNexus in the spring I was struck by the fact that every demo I attended used an AI code assistant in some capacity. Second, developers are forming communities expressly to discuss AI code assistant tooling. While r/GithubCopilot (6.2K members) and r/tabnine (141 members) were founded in 2021, r/cursor (5.5K members) was formed only this February. Beyond providing space to voice their enthusiasm, these and other dedicated communities (Hacker News, Dev.to, Tech conferences and meetups, etc) empower users to troubleshoot bugs, share tips, and negotiate best practices.
A few table-setting notes: like last year’s post, I am once again most interested in what developers are actually doing and saying. My research is qualitative. It is primarily derived from developer forums like Hacker News and Reddit, as well as private conversations I am having with practitioners and vendors. Also, this is not an inclusive list of features, and the desired features I listed last year absolutely endure today. Illustratively, I spoke with a Copilot customer recently who complained that language support for older coding languages needs improvement (2023 #4). For the sake of avoiding redundancy while casting a wider net, I have for the most part endeavored to cover new territory rather than doubling-down on last year’s claims.
Here’s the list:
- Tab Completion: According to many developers, tab completion is the killer feature in AI code assistants. These developers call not only for the ability to predict and accept a change using tab, but also to predict the next change after the current completion, enabling them to tab, tab, tab their way to happiness. Although vendors tend to make much of their chat capabilities, developers want to hit tab to accept autocomplete and move on—especially with new features like multi-file editing (see #9). Developers praise tab completion in Cursor. It is the first bullet point in Continue.dev’s feature list. Tab completion is something developers have grown to appreciate in numerous scripting tools and workflows, particularly on the CLI, and now demand it in their AI code assistants.
- Speed: Flow is essential to developers, and nothing pulls them out of this state more completely than lag. The issue of speed appears frequently in forums. For developers, accusations of a “sluggish” experience and being forced to wait is an absolute non-starter. Eliminating latency is the promise of smaller models, and I have heard several complaints that, despite the profound capabilities of OpenAI’s o1, the slowness of this model makes it impracticable for AI code assistant use cases. As Kevin Kernegger, Founder and CEO of Macherjek GmbH, explains: “o1 takes some time to think before it answers. So its not instant like we’re used to.”
- High-Level: Code assistants are not just for writing scripts and pushing pixels—they assist at app building’s planning stage. Tom Yedwab, Data Architect at Khan Academy, argues (via Reddit’s r/ChatGPTCoding), that one benefit he gets from Cursor is the high-level perspective it offers of his projects:
this tool feels like it is reading my mind, guessing at my next action, and allowing me to think less about the code and more about the architecture … I am building.
As someone skeptical of AI’s ability to insert itself in every step of the SDLC I find this quote particularly exciting. Today, AI code assistants are helping developers shift left in their workflow in order to think big picture. Some developers report that this change has made them “feel more like a solution architect.” Others complain that the technology isn’t quite there yet and sometimes seems to be getting ahead of its skis (“I’m aware of Devin and a few other higher-level systems, but they seem (a) like they’re still vaporware and (b) they’re actually aiming for an even higher level of functioning that relies on more judgment, taste, and design skills — too ambitious, don’t want to delegate *that* much.”). Although use cases continue to center on autogenerating code rather than architecting it, this move to use code assistants to think high-level is deeply significant because it means that these assistants require full visibility and comprehension of an entire code base.
- Superb Suggestions: Obvious? Maybe. Developers naturally want the first suggestion to be the right one, but this requirement usually has more to do with the technical capabilities of the model than the assistant itself. In fact, ongoing debates around which model is better usually come down to which gives the best suggestions for a particular project (often this has to do with language support). What this means for vendors is that they must support multiple (all?) models so that developers are able to select the one that gives the most excellent suggestions for their individualized use cases. See #7 below for more on this.
- Context: Context is King. Developers frequently post questions to Reddit like “How to feed/provide documentations to Github Copilot for context?” and “Looking for an LLM Fully Aware of My Entire Project – Alternatives to GitHub Copilot?” Similarly on Hacker News, Lucas Jans, VP of Product at Agency Revolution, explains:
I want to build my own agents so I can [have] my private domain specific awareness in the coding environment: PRDs, product docs, API docs, business goals, etc.
Today, this sort of context-aware AI code assistant is promised by many vendors in the space including Augment Code (“It’s the most context-aware Developer AI, so you won’t just code faster, you’ll build smarter.”) and Cursor (“Reference code with @ symbols to be used as context for the AI.”). Meanwhile, the promise of GitHub Copilot is that it can be customized to their own organization’s private code and processes in addition to context from public repos. Competition on the context-front is fierce for very good reason.
- IDE Fork or No Fork?: There is no consensus about the VS Code fork that Cursor and Void use, except that forks add friction and some developers are mad about it. The folks at Cursor chose to fork rather than build an extension because:
VSCode extensions have very limited control over the UI of the editor. Our Command-K and Copilot++ features aren’t possible as extensions. Same for much of what we want to build in the future!
Although using a forked, closed-source version of VSCode is inconvenient, the benefits of Cursor have made the sacrifice worthwhile for some. Others opt to use Sourcegraph’s Cody, Continue, and Copilot (among others) because they are VS Code extensions. As Francesco Belladonna, lead developer at Jane Technologies, writes: “Honestly, I’d rather have a buggish extension than having to change editor.” Still other developers, wishing to avoid the drama altogether, are adopting Aider, which markets itself as “pair programming in your terminal.” At the end of the day, like everything else in developer tooling, it comes down to developer experience, and the code assistant that offers the best experience will win out.
- Multiple LLM Support: I’m cheating a little here as this point is a simplified version of points 9 and 10 that I made last year, but, hoo boy, has it become more relevant. Developers are opinionated about their preferred models. A lot of developers are raving about Claude in 2024, it’s been a huge market impact story. For example, Wes Bos, host of the Syntax podcast, goes into raptures about it. Probably the biggest announcement at Universe was the new models integrated into Copilot of Anthropic’s Claude 3.5 Sonnet, Google’s Gemini 1.5 Pro, and OpenAI’s o1-preview. Thomas Dohmke is right to frame this move as one of “developer choice,” as it is very much something practitioners demand.
- Multiple LLMs Simultaneously: Developers want to use two or more models at the same time in order to leverage the strengths of each. According to one Redditor, who uses ChatGPT and Cursor for unit testing:
My ideal outcome is to have OpenAI specify the test plan, chat with Cursor who would execute the test, interpret the output and ask OpenAi any questions. This would be repeated until the unit test is passed.
Other developers in the space echo this desire and have had moderate success. My take away for vendors is that supporting multiple LLMs is now tablestakes; supporting multiple LLMs simultaneously is the future.
- Multi-file Creation and Editing: The ability to create and edit files is tablestakes. What makes some assistants stand out is how contextually aware these created files are (see point 5 above). According to one Cline user:
I’ve been using Cline and really like it, especially the way I can say “make a new function that does XYZ” and it can easily review all existing ones, and create as many files as necessary. Same with if something isn’t working, I can paste an error code and it goes through the files and comes back with “I see what the issue is…” and so on.
Interestingly, multi-file editing is still “evolving” and therefore buggy (see #10 below). Developers flock to forums to discuss how to optimize, and some have built tools to bridge the gap.
- Mitigate Unintended Deletions: Multi-file editing and creation is supported by more AI code assistants today, but with this capability has come a wave of disgruntled developers that are now grappling with unintended deletions. One Redditor complains: “Of the last 100 times that Cursor has deleted a file of mine, maybe 2 of those deletions were intended behavior that I actually wanted.” This is undoubtedly user error, and the thread is full of helpful users pointing out ways to keep this from happening. However, as this is a post about developer experience, it is worthwhile to point out these areas of friction. The on-ramp to becoming a power user of these tools is steep, and the consequences of missteps can be serious. As more and more of the code creation process is automated and unreviewed this will only increase in importance.
As I look ahead to 2025, I am reminded just how important developers will be in determining the winners and losers in this market. Indeed, if there’s one thing my research made clear it is the importance of developer experience. As I explained last year, but is worth repeating:
Code assistant vendors are uniquely reliant on the success of their product’s developer experience. These tools exist to assist, which means they work in support of the developer and must cater to their needs and expectations. They are not an express train to a post-work future. No one is laying off their engineering teams and replacing them with robots. To excel in this competitive and potentially lucrative tooling space, the companies developing these tools ignore the wishes of their developer users at their peril.
Interested to hear from the community. Do you disagree? Did I miss anything? Let me know here in the comments, on Bluesky, or LinkedIn.
Disclaimer: AWS, GitHub, Microsoft, Google, and IBM are RedMonk clients.
No Comments