A RedMonk Conversation: Thomas Dohmke Chats AI Agents and Shower Coding

Get more video from Redmonk, Subscribe!

In this RedMonk conversation, Thomas Dohmke, CEO of GitHub, chats AI agents with Kate Holterhoff, Senior Analyst at RedMonk. They discuss the definition of AI agents, GitHub Copilot’s role as peer programmer, and AI as an abstraction layer in software engineering. Thomas highlights how these agents can assist developers by automating tasks such as test generation and code reviews, ultimately enhancing productivity, while underlining the importance of human interaction in AI-assisted programming. The discussion also touches on the future of AI in development, including the need for continuous improvement and adaptation to meet the demands of modern coding practices.

This was a RedMonk video, sponsored by GitHub.

Transcript

Kate Holterhoff (00:00)
Hello and welcome to this RedMonk conversation. My name is Kate Holterhoff, Senior Analyst at RedMonk. And today my guest is Thomas Dohmke, CEO of GitHub. Thomas, I appreciate you joining me to chat about AI agents. Thanks for coming on.

Thomas Dohmke (00:11)
Thank you, and I appreciate to be in your podcast today.

Kate Holterhoff (00:14)
So I suspect our audience doesn’t know a whole lot about your background. And that’s really a shame because as a sometime academic myself, it is quite impressive. Specifically, you have a PhD from the University of Glasgow in mechanical engineering. So Dr. Dohmke, talk to me about your education, research and early career. Whatever you think would be relevant for our technical audiences to have a better sense that, you know, when you’re speaking about technology, you really know what you’re talking about.

Thomas Dohmke (00:41)
I stumbled into the PhD because at the time I was working for Mercedes or back then called Daimler Chrysler and they didn’t have a full-time position. So the only position they could offer me was a PhD position. I was like, eh, do I want to do more academia or not? But it was kind of, can work on the S class and then write your thesis in Glasgow. So I actually did a test-driven development for control system design. So I took the ideas of Agile.

TDD and applied it to mechanical engineering control system design to develop a driver assistance system for the S class that breaks and to avoid the collision and Even that you know was not my background. I taught myself coding in the early 90s. I grew up in East Berlin the wall fell I finally was able to buy myself a Commodore 64 and lots of books and magazines

back in the day when you didn’t have the internet and tech, workflow, Reddit and GitHub to learn coding and YouTube, of course, and MonkCast and, and, you know, learn coding the hard way. I remember many frustrated nights. I also remember many nights where I wanted to code, but played computer games instead. And so it was natural for me after high school to say, okay, I want to study computer engineering in Berlin.

got into this job at Mercedes, wrote my thesis while working at Mercedes at Uni Glasgow And then at some point I had enough of the automotive industry and decided to create a startup building iPhone apps. It was the time of the iPhone SDK and the App Store was out and I built as a contractor apps. at some point a few friends and I realized there’s an opportunity for us to create a company that helps.

mobile developers with their testing, their distribution of their beta builds, crash reporting, so collecting diagnostic information and all this kind of stuff. so we created what today probably would be called mobile DevOps, but at the time nobody called it DevOps, at least not to my memory. And that company got acquired by Microsoft in late 2014. And that’s how I got here effectively.

Kate Holterhoff (02:45)
Wow, I’m so interested in this junction of academia and the auto industry. Do you feel like any of those early lessons have extended into what you’re working on today?

Thomas Dohmke (02:56)
It has extended both ways, I think. know, the cars, modern cars have become software defined vehicles. Effectively, a car today is the computer that runs the software that you experience when you’re driving the car, when you’re using self-driving. And I think for most drivers, the driving experience is maybe equally important than the software experience, the navigation system.

settings you can set, whatever you have on your big central screen. if you look at Teslas and more and more cars, also here in Germany and China, they update as often as does your iPhone or your Android phone. And for me, the other way around, I think I learned two things. One is working for a big company. 2014, I joined Microsoft and then…

with when we acquired GitHub in 2018, GitHub became part of Microsoft. So I of like applied the learnings twice, once to design my life at Microsoft and then also to integrate or not integrate GitHub into Microsoft and keep certain parts independent and make sure that this is becoming a successful acquisition for Microsoft. I think the other part is that today when we…

when I talk with customers about DevOps and AI and Copilot and all these topics, a lot of them are embedded software developers. A lot of them build embedded systems. And while that word also has evolved, during my time at Mercedes, the control units had really small processors. They were about the same as a Commodore 64, seriously limited in memory and bandwidth and those things.

A lot of that is still true today and embedded developers live in a very different world than a web developer that can just spin up a thousand containers in a cluster. And if they need to scale out, they just add another thousand in another region. That doesn’t work when you do embedded software, whether it’s in your thermostat at home or your car or where else, a lot of that, know, C, C++ is running.

Kate Holterhoff (04:52)
Yeah, I think it’s just important to touch on your background there because when you talk about developers, you have a lot of empathy and you’ve been in the trenches. You’ve really seen how projects can look over time and you’ve seen it, you know, not only from an academic perspective, but from an industry one. So let’s talk about agents here because that has been such a big part of what you’ve been writing about on your blog. so you know, it’s a conversation that we’re following closely at RedMonk.

It seems like everybody’s talking about it right now. And I’m excited to get your two cents on what we should be following here. So I mean, just to begin with, how do you define agents? What is an agent at GitHub? Is this definition evolving? How should I be thinking about this?

Thomas Dohmke (05:33)
Yeah, you know, I actually have this book here, AI Engineering, and it has the marker where the definition is every now and then I look it up again. So I have a profound answer that is better than whatever I’ve made up in my head. And the book says something along the lines of, know, the term agent has been used in many engineering contexts. You know, and I think the most fun example that we often forget is that your browser is called the user agent. And at least in every log file, you know, the string, the browser string.

whether it’s Safari or Chrome or Edge or whatever leaves a fingerprint in the header user agent. So it’s clear, we have used that term agent in many different scenarios in software engineering for 20 years. think in general, you can kind of define it as something that is characterized by the environment it operates in and by the actions it takes and by the output it creates. And as such, think,

Whether it’s your CI/CD process, your build process, you could call that an agent because you have an environment, you your code or your pull request, you have a control flow actions that takes, and then it produces an output, a long log file that is not too dissimilar from an AI agent that also looks like a build log, right? Like it generates lots of lines of output and that scroll through if you’re watching it, and then you have to figure out what is it actually doing? And at some point it stops and asks you something.

And so I think, know, in the term itself is just a different way of saying bot or workflow, pipeline, all these fit that same description. You know, even a batch script fundamentally is an agent, but an AI agent is different because the model defines what the agent does, right? Like all these processes I’ve described, they have a predefined control flow. The developer in a script or in some, you know, program have defined what that bot does.

The AI agent, we let the model define that flow. And we give it some input and some context. And that might be actually the same input and context, you know, code and descriptions as for other agentic flows. But for AI, the model then, you know, figures out what to do itself. in the most, you know, modern interpretation, the SWE agent, S-W-E agent, then the model actually loops multiple times it reasons and looks at the output, you know,

for example, of the npm command or the Maven command. And it sees there was an error message because you have a missing MySQL dependency or something like that. And then it debugs its own issue and basically can run another command to then install MySQL before rerunning that npm command. So that’s, I think, where we are today with AI agents. they are taking some input in an environment. They have some tools they can use, you know, the browser, the command line, the internet, and then they create something for us.

that then hopefully human reviews as part of the code review process.

Kate Holterhoff (08:20)
Yeah, one would hope. Although I’ve been very interested in this idea of vibe coding. I always joke that at RedMonk, we’re vibe analysts. So the idea that Andrej Karpathy is talking about where we’re vibing is…

Thomas Dohmke (08:34)
Yeah, and you know

there was a user on X the other day that used the GitHub mobile app in the shower to merge a pull request. And so we might also evolve from vibe coding to shower coding, because you can use your mobile phone as it’s most of the modern smartphones are waterproof in the shower. And then with agents, you don’t really need to type code, you type instructions and you can do code review on your phone and merge the pull request. so you might.

you might get into the world of shower coding at some point in the future.

Kate Holterhoff (09:05)
Oh man, that is the

natural evolution of Vibe coding, definitely. Yeah, shower coding. Clearly I have some homework to do here on the shower coding front. Oh yeah, please.

Thomas Dohmke (09:13)
I’m sure we can edit to the show notes.

Kate Holterhoff (09:18)
So let’s talk about GitHub Copilot’s agent mode. What is that and how are folks using it?

Thomas Dohmke (09:25)
We always said agent a lot. So let me try to put that onto the spectrum of all the tools. So if you think about it, auto-completion and chat, they’re very far on the left side. They’re still very manual. The developer does most of the coding and they have an assistant. The SWE agents, are pretty far on the other side of the spectrum. They assign them an issue. They build the code, they test the code, and they submit a pull request.

And as always, when we do build these autonomous systems, and I’d like to compare that to self-driving cars, there’s a question of what’s in the middle, what’s on that journey to get to that full self-driving car. Now, we both have been in San Francisco and there you have Waymos and the ring fence within the city and well-defined, you know, to that scenario, they know exactly what the streets of San Francisco look like on a map and so on. And then you have…

driver assistance systems that work outside of San Francisco, whether it’s autopilot in your Tesla or the adaptive cruise control of a Mercedes or whatnot. They are not there yet, but they help you in that mode. So agent mode is kind of like a driver assistance system. It’s not the full scale agent that you can just assign something to and it runs off and it wakes you up when you reach your destination.

But it uses similar technology, models with a chain of thought, you can in VS Code instead of being in chat mode, you can be in agent mode and you can ask it something like build a snake game. And when you did that in chat, I actually did that on stage in Rio de Janeiro in May, 2023, I think the demo was 15 minutes in which I built snake game and I copy and pasted the code out of chat into files and I had to know, okay.

I need to open a new file, need to save that file under a certain name. Well, if you do that in agent mode, it creates all these files for you. So it knows, you know, the HTML goes into an index HTML and the JavaScript goes into a file and the CSS goes into a file. And it knows when you build maybe a backend application, you need to install a package. So it shows you the command that you have to run in the terminal. It doesn’t run the command for you because that would be scary. Running commands on my machine coming out of machine learning model, you know.

how long until one of those models deletes all my files in the project or something like that. And so you still have to click the Run button, but it can figure out, okay, you need to install an npm dependency or a Unicorn web server, like that. And it shows you to do that and you click Run. And then at the end, you finished the one flow and now you can try it out and run that application. And if it all works, you can just go back and…

define a new task and tell the agent, now that we have that basic game, I want to add an AI, a non-player character, or I want to add more obstacles, or I want to make it 3D. And the agent understands, the agent mode understands the context of your project, your workspace in VS Code, and so can make modification to that code base. And that actually, the bootstrapping something new, that’s a cool demo.

But that’s not what developers do in their day to day. 90 % of the time, you’re modifying either your own code or other developers’ code. And so having something that is available there for you in an existing code base to say, these are the 10 files I think you need to modify. Here’s an idea of how you could modify these files. You can still accept or reject the modifications, the diff, if you will, to those files. It’s incredibly liberating for software developers. That’s where this vibe coding

term comes in, you’re kind of like in the vibe and you’re defining of what you want to build and the agent does things. And because you still understand code, you have to still understand code and what you’re actually doing. It creates this feedback loop between you, the agent and the programming language that it generates. And I think many developers really find that joyful, especially with these new models like Claude 3.7 Sonnet.

Kate Holterhoff (13:17)
Yeah, and I would say we are certainly in the agent hype sphere right now. I I open LinkedIn and it is just everywhere. The amount of thought leadership of folks talking about it. And it’s challenging to see the difference between what is just overblown vaporware, wishful thinking, and what is happening in reality.

Thomas Dohmke (13:22)
you

Kate Holterhoff (13:39)
And so, I like this example of being able to build something on stage. But I think what you’re pointing at is that, yeah, most developers are working with brownfield projects. And so that’s that’s going to be the challenge. helping developers to modernize apps and do this sort of challenging work. I we’ve mentioned writing tests. mean, if agents can help with that, I think that that’s going to be the killer feature.

Thomas Dohmke (13:48)
Yep.

And most developers move from project to project, whether it’s just checking out an open source library, realizing it doesn’t do exactly what you want. Then you want to modify it, you fork it. You start figuring out how has the maintainer of that open source library structured the project. And so while you might be the expert in your own code base, all of a sudden you’re working in somebody else’s code base. And so having an agent, a planning agent available to just show you which files you have to modify.

and maybe hint out that you have to update the Selenium tests or something like that. And then you can actually submit the pull request and you have a higher confidence level if you do that without an agent because it helped you within the context of that open source project to make the modification. And then the code review agent, maybe that comes in. And so by the time the maintainer of that open source project that you’ve forked looks at your pull request, they’re like, well, Kate, this is great.

It might be your first pull request ever, but I’d love to merge this. Please, please send me more.” And that behavior is not only true between you and an open source project, it’s actually true within any decently sized company. where teams of developers building multiple things, moving from one team to another is actually like joining a new company of different programming language, different approach to things.

Maybe even a different philosophy in microservices or monolith. mean, certainly at the size of a GitHub or Microsoft, we have almost everything in terms of project size and scale and architecture. There’s a huge difference between Xbox and Office and Azure and GitHub and LinkedIn. That’s just

the big titles of all these teams. then each of them, you know, I think at GitHub, have over a thousand services in our service catalog. And so you can kind of imagine how much an agent can do to help you to move from one project to another and ramp up on that project and be productive, you know, hopefully within a couple of hours.

Kate Holterhoff (16:00)
And you have been very explicit about keeping that human in the loop when it comes to AI code assistants

all your blog posts seem to mention it at some point. That is my feeling as well, that the humans do need to be involved. But I feel like agents have this interesting place when it comes to humans. So, let’s first talk about pair programming and what you have called peer programming. So.

Thomas Dohmke (16:16)
Mm-hmm.

Kate Holterhoff (16:24)
I was really interested in your post “From assistant to peer programmer” because you mentioned TDD here. yeah, when it comes to Agile folks, they have resisted thinking of AI as a pair programmer. Folks on Hacker News will jump in talking about AI not being able to work in Agile or XP as a pair programmer.

So, how are you thinking through that? Do you feel like Agile is going to become less relevant as we move into a more AI-assisted future?

Thomas Dohmke (16:53)
You know, the great thing about the internet is that there’s never not a healthy argument about anything. And in fact, you know, if nobody criticizes your work, it probably wasn’t meaningful enough and maybe not even worth doing. So I think it’s a good discussion to have. And actually, if you look up, know, on the Agile Alliance glossary, the definition of a pair programmer, it starts with saying, you know, there’s a programmer at the keyboard that’s called the driver. And then there’s the other.

programmer, know, the pair, I guess, the other side of the pair that is not involved in programming and that’s focusing on the direction and it’s called the navigator. you know, pilot, Copilot, driver, navigator, I think that’s actually pretty close definition wise. Now the question is obviously at what level of the abstraction that are you. And if you look at the original Copilot with code completions, right, it’s very low on that, on that abstraction ladder

the driver, know, the pilot types in the editor. So it gives context. And then the navigator, the Copilot, the other side is taking that, the input from the driver and what else you have in the file, above the cursor, below the cursor, adjacent tabs, maybe what you typed, a minute ago, and then it creates, a prediction of the next few lines. And so it does set the direction, but obviously at a lower…

abstraction level as we as humans understand our AI pair program. Although I have done pair programming, depending on where you are in your career journey, it might as well be that the person next to you just dictates you the proper syntax or when you do it in Objective-C or Swift doing iPhone development, a lot of the method names are very long and we didn’t always have as smart auto-completions as we have today.

And but I think the concept, the concept actually works. I type something I’m control and my Copilot gives me ideas of what to type next. And if I don’t like the direction it’s going, well, I can just keep typing and I cannot accept, the suggestion, not hit the tab key, or I can hit the tab key and then edit it. And because you do these models can fill into the middle. They can actually then, when I keep editing in the middle of that suggestion, they start, directing me again into that next phase.

And then, you know, after code completions, which we launched almost four years ago, June 2021, came chat, right? Like with ChatGPT in November 2022, the whole world changed of how we think about these things. And of course, chat is even closer to, a human sitting next to me because I can ask them to explain code to me, right? I can ask them dumb questions. And the nice thing actually is that I don’t have to worry about judgment because the AI doesn’t judge me. It has infinite patience.

It just keeps answering, right? And so you can actually explore topics. One of the most earliest demos I saw of Copilot Chat was it explains a security vulnerability, a SQL injection. can ask it, show me how an attacker would exploit that vulnerability. And that’s what you usually don’t get from documentation or from, you know, security training, is that somebody shows you, OK, this is the input string that I feed into your method to exploit that SQL injection.

And so think, you know, as we go through this journey of AI assisted programming or AI native software development, the Copilot does, more and more become a pair or peer programmer that helps me in my day to day, while I stay in the middle of the process, I remain the driver or the pilot.

Kate Holterhoff (20:06)
OK, that all makes sense. And I think what I’m hearing then is instead of getting rid of Agile programming, we’re actually doubling down on it. That as everyone is increasingly getting on board with these Copilots and more and more folks are using them, that everybody is going to be pairing in some capacity. And so maybe we just need to redefine it or I don’t know. mean, you kind of have because you didn’t call it.

Let’s see, so you call it peer programming rather than pair programming. Was that on purpose, can I ask?

Thomas Dohmke (20:36)
It was on purpose because we wanted to explain that now that we’re four years into this journey, the Copilot becomes more and more a member of your team and not just something that, you know, sits in your editor and helps you. But I want to come back to something you just said, which is I think, you know, the Copilot being available to anyone on this planet actually teaches everyone how pair programming works.

Because, when kids use Copilot to learn coding, that is effectively learning the pair programming, the underlying principle of pair programming of, exploring a topic, asking questions, processing the answer. Similar to a human, the AI doesn’t always have the right answer, right? When we talk about model hallucinations, let’s also be real that, there’s lots of wrong information on the internet. When you find code snippets, they more often don’t work and you have to modify and make them work, right? Trial and error.

And then we discuss between two developers how things should be, whether it’s architecture or coding patterns or do I decompose that, refactor that method into two smaller methods. There isn’t a right or wrong. There’s different answers to that. And I think we will enable our kids to learn these behaviors at the earliest. I think the other side to that is that in many ways,

Copilot and AI in general has been so successful with software developers because with Agile, with TDD, writing test cases, and with DevOps, code review processes, not pushing to main but pushing to a pull request, running all my CI/CD, running my unit tests, and then having a human reviewer that is the code owner of that file and say they can give me feedback. These practices and these quality gates, they have enabled us.

to use AI to its fullest potential, while it certainly had flaws, right? Like in 2021, when Copilot launched on the Codex model, at a time when GPT-3 was the best model, it certainly wasn’t perfect. That’s why we only launched code completions. And yet it quickly led to developers being more productive, based on some metrics that we can get into, but also based on what developers have been telling us. And I think the best…

measurement that you can have on any software development tool is to ask the developers how they feel about the tool. And whether it was internally here at GitHub or whether it was when we once we had launched Copilot, everybody told us, I don’t want to work without Copilot anymore. And if I don’t like it for a minute, you know, because I’m currently in a mood or it annoys me more than it helps me because I know what I’m doing. Well, command shift P disable Copilot. It’s easy enough, to be no shortcuts to disable it when I need my

quiet time in the same way that sometimes I don’t want to do pair programming and I just want to code something and then I send it to pull request review and then my team, my peers will give me feedback.

Kate Holterhoff (23:20)
Yes, that all makes sense. And I like how you have brought up the fact that this can be the kind of thing where you spend some time in deep thought, more or less by yourself. And you can also engage with the AI assistant that can enable you and help you to get a task completed, maybe in a less painful way. So I like how we’re expanding the definition.

And you had mentioned abstractions. That is a word I’ve been thinking about a lot lately. You know, as a frontend engineer, abstractions were a big part of everything that I touched, right? As you move up the stack, that certainly is the case, whether it’s, compiled languages, or like Vercel being a wrapper around AWS, right? So at all levels, abstraction is something that’s always top of mind.

Thomas Dohmke (23:52)
Yep.

Kate Holterhoff (24:02)
But something that I’ve been hearing more and more is that AI is going to just become another abstraction layer. And this is something that developers are comfortable with, this idea. But I don’t know. It feels different to me. Do you see an equivalence there? Would you say that it is just another layer of abstraction?

Thomas Dohmke (24:19)
The abstraction layer is human language. It’s not the AI that’s the abstraction layer, it’s the human language that’s the abstraction layer. And if we look at how software developers work, whether they work on their own or in a small team startups, a team of 10 or so, typically don’t have other roles, they just have engineers. And some of them might take more product management roles or more designer roles, or one of them has to be the CEO. So we have a CEO.

Kate Holterhoff (24:22)
Right.

Thomas Dohmke (24:43)
But as the team grows, then you have the product management role and the designer role, which both are abstractions that the AI can process. Or what they output as is the abstraction. The PM outputs a specification, a description at the epic level to come back to Agile or at the user story level or at the task level. And the designer outputs a visual abstraction of the problem. But whether you have a PM and a designer or whether just on your own,

and you’re working on your open source on your hobby project, well, the ideas in your head, they’re all in human language. And often they’re much bigger than what you can actually implement in code. And so what we do, you know, is we take that big idea and we decompose it into smaller ideas. And that still happens in human language, whether it’s on paper or in a planning and tracking system or just a note sheet. And so you typically write that still in human language. And at some point you get to the point, okay, now I know kind of like

This should go into a file or module or model view controller or this kind of things. And then you start converting the ideas that you have in human language into programming language. And that obviously is a, you know, imperfect conversion. If it would be perfect, then we would always write code that has no bugs and that does exactly what I wanted to do right away. But that’s not actually how software development works. Right. It’s a lot of that is exploring how to do it while we’re doing it. And they’re using.

frameworks like React or Ruby on Rails to simplify that work because then we know where things go and how we do that decomposition process. But that’s effectively why we are often no longer talk about programmers, we talk about software engineers. Because it’s an engineering task. It’s a task that takes a very complex problem, applies the craft that you have built over the course of your career. And it takes the experience that you have and

the constraints that your team might be giving you, whether it’s the language or the cloud or the cost that you can spend on should it run on a mobile phone on the edge or does it run on a backend and everybody can open it with a browser. It’s those decisions where you still need engineers in the process and where the conversion from human language to programming language is simplified with the AI. AI might write code for you, but at the end of the day,

The real engineering process is not the code writing. The real engineering process is the design of the system and then converting that design into something real.

Kate Holterhoff (27:08)
that certainly resonates with what I’m hearing around English being the fastest growing coding language. I’ve a, and German, well, thank you. Okay, yes, language in general, human language. But I still feel like, if we think about developer tooling, there was this black box deterministic aspect to these SaaS products that,

Thomas Dohmke (27:14)
And German.

Kate Holterhoff (27:30)
I had always thought of as being abstractions. And that’s not quite what is happening with these AI code assistants or even the future that I’m hearing other executive leadership talk about around that the developer tooling is going to fall off because everyone’s just going to build their own bespoke solution here. it feels like there is something even more revolutionary happening, that there’s like a break.

Thomas Dohmke (27:57)
Well, I think the part that we haven’t covered yet is that you’re going to get to a point where, and you’ve asked me about agents, where an agent can take over a part of that engineering work. Right? And I think that’s again, going to be the role of the engineer to figure out when can I hand it off to the agent. And with the code completions, you know, that’s a very simple example. You type and you see a prediction and you can accept it or not.

And you’re making decisions very binary by pressing the tab key or keep typing. But it actually forces you more into a mode where you have to both write and read at the same time to understand what your suggestion, the prediction that you’re getting, is that good enough to what you want to achieve? And is it faster for you to press the tab key and modify that? Or do you keep specifying what you actually want to get the right prediction or actually to just write it all out by yourself?

If we look at, know, agents and we announced a few weeks ago, something called Project Padawan. you know, like Jedi and Padawan, you need a lot of patience and it needs to learn still. It’s not, it’s not perfect yet. A very kind of sign, a GitHub issue to Copilot and then Copilot behind the scenes. You don’t even see it doing its work. It spins up, a compute environment, a code space. It installs all the dependencies. It checks out the code.

It looks at your description, if you uploaded images, screens, Figmas, whatever, to the issue, it can also analyze that. And then it starts creating a pull request, a draft pull request, where the description outlines its plan, just like a human developer would do. And then it commits into the pull request. And in the meantime, you, as the pilot, you can actually do something else, because it runs in the background, and you can focus on another task.

maybe, designing another component or reviewing code of the previous agent run. And at some point you get to the pull request and you can see the work of the the Paravan of the agent. Just like a human developer would have done and you review the code and when you don’t like code, you highlight those, five lines of code and say, well, this is this isn’t working or this has a bug or refactor that in a separate method. And then

Copilot can pick that up again and reformulate the pull request. And of course, as with every pull request you can also take the GitHub CLI, you know, GH and check out the pull request on your local machine, open it in VS Code with Copilot and keep working on that. And I think that’s a completely new way of working as a software developer. Or it’s exactly the same way we’ve always worked because usually, most developers work in groups of teams.

And the Copilot has joined, as a member of your team that you can assign certain tasks to. Now, the critical thing on this is that you have to figure out ideally before you assign the issue to the Copilot, whether the Copilot can actually solve the issue. Because if you assign random stuff, know, not so well defined issues, too high of an abstraction. My classic example is, you know, you find an issue, say, build me GitHub.

Now, no SWE agent can do that, right? Like it just doesn’t work. And we as humans, intrinsically know that. I read intuitively know that, if I read that issue, I’d say, well, PM, go back and break that down into smaller chunks of work. And let’s talk about what GitHub even means in this context. But an agent can do that. An agent, looks at this and looks at the code base and then tries to solve the task. And as it comes back to you and generates garbage,

Kate Holterhoff (30:55)
you

Thomas Dohmke (31:20)
It has burned compute cycles and it has cost you money that you don’t want to spend. And if you do that over and over again, with a SWE agent now gets assigned to an issue and then cannot actually solve the issue, you get frustrated and you stop using it as we do with all the tools in the developer tool chain. If the tools don’t do what we expect them to do, then we stop using them and move on to something else. So think that’s what the future we’re heading into, where we are able to assign some work.

to a SWE agent and the SWE agent does the work for us while in other parts we still do the work with the help of a Copilot in our IDE.

Kate Holterhoff (31:54)
I like that you brought up Project Padawan, because as I am trying to figure out what an agent even is, it’s useful for me to have these examples of things that folks have tried. Talking about GitHub Copilot’s agent mode, what I think really attracted me to the blog post that you wrote about that was that it had a lot of examples about what that might look like. So recognizing errors and fixing them automatically, that makes sense, right?

suggesting terminal commands and asking you to execute them and analyzing runtime errors. and self-healing capabilities. I really liked that one as well. So yeah, so I, you know, I’m collecting these. I feel like this is like my new scrapbook of like, what could an agent even be here? Have you added anything to this list that was with the original launch?

Thomas Dohmke (32:37)
Test generation, think, is the one that most developers will be excited about. You can kind of imagine you submit a pull request, the agent looks at it and figures out, you don’t have enough test cases, or you’ve wrote some unit tests and they’re all passing, but you kind of got away with the minimum amount of tests to either TDD or validate your work. There’s never enough tests, or at the same time, it’s also true, there’s always too many tests because test suites, the bigger they get, the more flaky they become.

the longer they take to run. And so having AI figure out what’s the right amount of tests to cover all the code and all the scenarios, I think is going to be something many developers are excited about and actually think about that as not AI is replacing their job, but more like the dishwasher that’s doing the dishes for you while you can watch your favorite show on Netflix.

I’ve barely met anyone that has complained that they have a dishwasher in their kitchen and that took away the work of doing the dishes. And I think that’s the way I think about the right agent in our software development process. It’s those that do the work that we don’t want to do so we can focus on the work we love doing and that actually keeps our motivation and energy high. you know, we all had that moment as a developer.

where it’s late at night, but you’re so in the flow and it’s almost like a rush. know, the dopamine is flowing and you can’t stop working because I just need to finish this, one thing. And I think that’s if you can get more into this flow state where developers are happy doing these things, then AI is actually on the right path. You know, one more example, as I mentioned, pull requests is, you know, just code review.

Kate Holterhoff (34:11)
Yes.

Thomas Dohmke (34:11)
We

at GitHub are a remote first company. We have developers in many parts of the world, but even just in the United States with its multiple time zones, the developer in New York might be waiting for his coworker on Oahu to review the code. That’s five or six hours away. If you have a Copilot and we actually…

ship that recently, if you have a Copilot reviewing your code and giving you initial feedback, saving you the embarrassment of, the, bugs or the syntax errors or the thing that you overlooked, your fat fingered, your commit, by the time you actually committed that and pushed it, you realize, I copy and paste it something into that file that shouldn’t have been there. And Copilot points that out. And then by the time,

that coworker wakes up and looks at your pull request, everything is already great and CI/CD is passing and you already moved on to the next task. I think that’s where we’re going to see the true value of these agents.

Kate Holterhoff (35:04)
have you seen a code review agent done well yet? I hear that from everyone, that that’s what they want. And I just, don’t know if that’s the future or if this is, if folks are really doing this right now.

Thomas Dohmke (35:08)
Yeah.

Ours is getting better, I’d say. Nothing is perfect, but ours is getting better. These reasoning models like OpenAI’s o3-mini model or Claude 3.7 Sonnet thinking are getting better at doing this because they have these reasoning capabilities or the chain of thought where they can actually get to a better state, a better outcome in the pull request review. So I’d say we are getting there.

Kate Holterhoff (35:17)
Yeah?

Thomas Dohmke (35:37)
It’s definitely not at the level of human reviewer and from a security perspective, in decent sized companies, we will never have a state where we want to merge a pull request without a human reviewing the pull request. I think that’s from a security, from a zero trust perspective, that is continue to be crucial. And so the Copilot doesn’t have to be perfect in reviewing the code. It just needs to make it a little bit easier.

Right? Like if it makes it 10 % faster from the time I submit a pull request to the pull request being deployed, well, I take that trade off any day and have Copilot in all my pull requests. And in the worst case, it says I have nothing to say, which is sometimes funny because it’s the hard part about user interface design. can either not say anything and then you don’t know if it got finished or if it’s still running or stuck, or it can tell you that it has nothing found and they’re like.

Well, why are you telling me that? Why are you sending me a GitHub notification? So I think we are going through the evolution of these capabilities. We’re pretty happy where the current version is, but there’s certainly more to come as the models become more powerful and we learn more and more of how to build these AI systems.

Kate Holterhoff (36:45)
And I think that that particularly applies when we think about the bugs that are being introduced by AI.

Thomas Dohmke (36:51)
We did recently a quality statistic. I’m not sure I remember enough data, but I think the gist of it was that the code written with AI versus code written without AI was about on par in terms of quality and had higher test coverage and better documentation as you would expect. And then, so the argument you could make is that, if the AI code is about as good as the code written without AI,

but has higher test coverage, better documentation, has higher developer satisfaction, and is written faster because that’s what most of the studies show that developers are in real life something in the range of 25 to 30 % faster in case studies up to 55%. But obviously these case studies are exactly that. They’re like artificial scenarios that don’t easily convert into reality. But let’s say 25 % faster. Well, if I’m 25 % faster.

the same level of quality, that’s a great deal.

Kate Holterhoff (37:45)
Right, yes, those are the statistics you’d want.

All right, I think we should start wrapping this up, but I wanna end on a nice high science fiction note here. So what are you thinking about in terms of a roadmap? What should developers be looking for? How are agents going to change every aspect of their day-to-day lives?

Thomas Dohmke (38:04)
We definitely will have more agents totally in the context of this very agentic podcast. The cool thing about the VS Code project is that VS Code team for, I think ever since they started 10 years ago have been in monthly iterations. And some of the plans are public and others are in private, but there’s going to be a new version of VS Code every month.

what’s currently being worked on is in VS Code Insiders. We want to follow along of what’s happening with GitHub Copilot in VS Code. You can just download VS Code Insiders and you can try out new features before we even announce them as they’re exposed through Insiders. We’re going to see more models. We’re going to see with these models, the agent mode becoming better and being able to cover more scenarios. I think we’re going to significantly

rethink what that means from a user interface perspective when you’re constantly working between the text, the files you’re editing and the agent itself. And then we are going to evolve the Padawan hopefully into a Jedi that can actually be assigned to your issues and you can predict what it does and you can steer it.

You can tolerate it, it doesn’t annoy you the whole time with lots of notifications or errors or, breaking your CI/CD. And then, and then you verify the output and you keep that loop going. And I think during 2025, we’re going to see, more and more evolutions, not only from, from us, you know, on SWE-bench can see, many, many companies in many fields actually now, building their own.

as SWE agents and competing in the SWE benchmarks. And so I think we’re going to see a really good competitive environment that is all focused on what we love doing at GitHub, which is making developers happy and productive. So yeah, think that 2025 is any year for GitHub, another version of what

the founders started almost 18 years ago when they launched the first version of GitHub, which is create something that developers love using every single day.

Kate Holterhoff (39:58)
I think that will get everyone excited. All right, so talk to me about how folks can keep up with all these, new and exciting things happening at GitHub and, your meditations on them.

Thomas Dohmke (40:10)
My meditation is on them. Well, GitHub blog, the GitHub social channels on X, on LinkedIn and other social networks. Myself, I’m ashtom on GitHub on X on LinkedIn. And you see me at various conferences. We’re not too far away from Microsoft Build in May, May 19 to 22 in Seattle. I will be speaking at WeAreDevelopers in Berlin. in

July and yeah, more announcements coming soon on my social channels.

Kate Holterhoff (40:41)
Fantastic. And I will include those in the show notes. I have really enjoyed speaking with you today, Thomas. Again, my name is Kate Holterhoff, senior analyst at Redmonk. If you enjoyed this conversation, please like, subscribe, and review the MonkCast on your podcast platform of choice. If you are watching us on Redmonk’s YouTube channel, please like, subscribe, and engage with us in the comments.

Thomas Dohmke (41:01)
Bye.

Links

Transcript

More in this series

Conversations (90)