Software engineers have no dearth of things to say about AI, but AI code assistants have generated the most riotous responses. By now every developer has tinkered with these assistants at least a little, and most have cultivated detailed opinions about what is working and what could be improved.
With the code assistant space growing rapidly, it is worth pausing to take stock of what developers actually want from these products. In my research on the subject, gleaned from private conversations, blogs, visiting dev forums, and attending vendor briefings, I have noticed several trends. But before we dig into them, let’s first lay out the most popular assistants that developers have available to them: GitHub Copilot, Sourcegraph’s Cody, Amazon CodeWhisperer, CodiumAI, IBM’s watsonx Code Assistant, Tabnine, MutableAI, AskCodi, Codiga, and Replit AI. In addition, many developers just use more general purpose chatbots. This past week at the connect.tech conference in Atlanta, J.D. Hillen, a full stack developer told me: “I use [ChatGPT] to create outlines and fill in the rest.” Of course, the devil is in the details, and developers are keenly aware of the differences separating this range of assistant options.
Hillen wasn’t the only software engineer at connect.tech who was willing to bend my ear on the subject of AI code assistants. Ben Dechrai, a developer advocate, is bullish on the potential of these tools while acknowledging their issues: “as GPT as a general technology improves the assistants are going to become more robust and reliable.” The complaint that code assistants are impressive but unreliable is one I hear repeatedly. These assistants are supplemental, and cannot currently do the heavy lifting required for professional software development. Using an assistant is like pair programming with a junior developer: the AI writes a bunch of code that the human developer must sort through to determine what is correct, what is gibberish, and what needs to be refactored for style. According to Vincent Mayers, the conference organizer and a recent RedMonk Conversation guest: “it’s a productivity tool; you get the code and rearrange.”
Code assistant vendors are uniquely reliant on the success of their product’s developer experience. These tools exist to assist, which means they work in support of the developer and must cater to their needs and expectations. They are not an express train to a post-work future. No one is laying off their engineering teams and replacing them with robots. To excel in this competitive and potentially lucrative tooling space, the companies developing these tools ignore the wishes of their developer users at their peril.
To that end, here are 10 things developers want from their AI code assistants:
- Summarize: Code assistants should have a robust ability to condense and explain existing code blocks in order to empower developers to get up to speed quickly. This facilitates collaboration and the ability to move nimbly through even the most complex and mammoth codebase.
- Autocomplete: Gone are the days of keeping frequently used code blocks in a txt or Evernote file, or else trolling Stack Overflow for promising code blocks to appropriate. Today code assistants can automatically fill in gaps in the code such as log statements, error messages, and code comments. Copypasta is no longer the law of the land, and developers are eager to allow code assistants to give their thumbs a rest.
- Tests: Writing tests is a top requirement developers want from their code assistant. Unit tests are essential to TDD, but they are a slog to author. By automating away this often annoying, but needed task, assistants are making developers’ lives much easier.
- More languages: This is perhaps the most unwieldy requirement because it is extremely difficult and expensive for vendors, and personal to the developers (after all, there are thousands of programming languages). So let’s dig into this one with some examples. IBM has made the decision to focus on a single use case at the moment (app modernization from COBOL to Java), rather than a wide range of use cases, with their watsonx Code Assistant for Z, while RedHat’s Ansible Lightspeed with watsonx Code Assistant offers a promising stab at an LLM enabled Infrastructure as Code solution. This depth versus breadth approach has many significant tradeoffs, and runs counter to the grassroots demands I’m seeing in developer watering holes for breadth. Other assistants have taken the approach of being actively verbose. Copilot is currently outpacing its competitors in this respect. In fact, an often repeated complaint/ differentiator separating CodeWhisperer from Copilot is the former’s support of fewer languages. But these two players are not alone in grappling with the language coverage issue. Hacker News user wolfeidau complains, for instance, that Cody is not accurate in Markdown. The language coverage issue is shifting quickly, but my takeaway is that developers are going to flock to the most polyglot option.
- Editor agnostic: Developers want to use their preferred IDEs, and they want a code assistant that works well with it. If the assistant doesn’t work seamlessly with their favorite editor then developers are going to complain of this friction and possibly switch to a competitor. Bolted on solutions are bound to annoy developers and drive them to more integrated options. It is a mistake, therefore, for vendors to put too much stock into a single editor. Users of Rider, IntelliJ IDEA, AWS Cloud9, Atom, and SublimeText don’t want to get short shrift just because VSCode is a behemoth.
- Intuitive UI: This one comes up a lot. Developers are often confused by these tools’ flows. They complain about issues ranging from inconsistencies in button placement to how to add and remove code repositories. When developers spend more time trying to figure out how to use your product than they save in writing code, it is a nonstarter.
- LLM Transparency: In order to provide visibility into downstream impacts, developers are looking for transparency around their code assistant’s partner LLMs (OpenAI’s GPT-4, Anthropic’s Claude). A black box approach to code assistants is less likely to appeal to the more advanced mid, senior, and staff level engineers these tools will need to court in order to be successful.
- Where is it?: To ensure privacy, developers want to know if their code is being uploaded to a remote server or if it will remain local. For the privacy-conscious, which includes most business use cases, many prefer that their code assistant run locally. Samsung’s source code leak resulted in a company-wide ChatGPT ban for just this reason. It is the nature of AI chatbots to learn from interactions, meaning that any two-way communication must be contained to avoid the exposure of sensitive information. Companies want to be able to train their own models in order to mitigate data sovereignty issues. The commoditization of siloed hosting environments is only becoming more important.
- Support for self-hosted OSS LLMs: Modular architectures that allow for switching out and replacing LLMs appeals to the more tinkering set of developers. Cody uses Anthropic’s Claude API, but in order to cater to the subset of users interested in trying out other LLMs Claude can be replaced by the OpenAI API. Support for substitution is important to the OSS LLM community. Web UIs such as Oobabooga and FastChat support OpenAI compatible APIs, and Redditor tronathan numbers among the enthusiastic experimenters eager to use LocalAI, a “The free, Open Source OpenAI alternative,” responding:
“This is fantastic, thank you! I’ve been so frustrated that all the new github projects only work with OpenAI. Being able to run BabyAGI, AutoGPT, LangChain, etc against my local models without having to futz around is a huge win. Thank you!”
- Recent LLMs: The fast moving pace of libraries, environments, and runtimes makes out-of-date information both risky to use and frustratingly inaccurate. Developers want the most recent models which is why so many developers are saying they prefer Anthropic’s Claude, which “was trained on data up until December 2022,” over ChatGPT4, which cut off in 2021. While for some developers this difference may be trivial, vendors will have the greatest success using the most recent LLMs available.
It’s worth pointing out that there is a lot of variety in developer responses to code assistants. For instance, whereas some like the conversational style of Cody, others prefer a more terse approach when mimicking natural language. I have also focused on the needs of developers at mid-level and above, rather than junior developers: a demographic which is also keenly interested in and opinionated about code assistants. While skills acquisition is an argument for the adoption of assistants, with AI potentially aiding in skills development and augmentation, for this post I focus on established practitioners with a baseline of proficiency. Perhaps in a future post I will take up the thorny issue of using code assistants as a tutor—particularly in consideration of the depth versus breadth tradeoffs I mention in #4 above.
As the code assistant space expands and matures there will be a normalization of styles and UI, but my research suggests that vendors seeking to impose features and norms that run counter to the needs of developers are going to have trouble gaining traction. More so than other developer toolsets, developer experience can make or break a code assistant.
Disclaimer: IBM, Microsoft (GitHub), Red Hat, and AWS are RedMonk clients.
Illustration created using DALL-E