Context Engineering in Practice: How Atlassian Is Building AI for Real Developer Work with Kun Chen

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

Get more video from Redmonk, Subscribe!

In this RedMonk Conversation, James Governor sits down with Kun Chen, Lead Principal Engineer at Atlassian, to explore what “context engineering” really means in practice—and why it matters far more than one-shot “vibe coding” demos for real software teams. Drawing on Atlassian’s internal experience building Rovo Dev, Kun explains how AI agents become genuinely useful when they are grounded in the lived context of professional development: Jira history, pull requests, CI/CD pipelines, incidents, and organizational standards. The discussion covers how Rovo Dev reduces developer toil, integrates directly into existing workflows and leverages emerging standards such as MCP to stay extensible. Along the way, they dig into hard engineering tradeoffs around models, tokens, cost, and signal-to-noise in code review, and look ahead to a future where AI agents take on more autonomy—shifting the bottleneck from writing code to deciding what’s worth building.

This RedMonk conversation is sponsored by Atlassian.

Links


Transcript

James Governor (00:12)
this is James from Redmonk. I’m here with Kun Chen, lead architect Atlassian And we’re here today to talk about, well, what I call context engineering and particularly practical AI for the developer. So welcome, Kun.

Kun Chen (00:26)
Hi, thanks James. Thanks for having me here.

James Governor (00:29)
Yeah, it’s awesome to have you here. I recently attended Atlassian’s conference here in Europe. I’m based in London and I thought it was interesting to me at least. One of the things that was really clear was the primacy of AI as an interface for Atlassian tools across the board going forward. And obviously one of the sort of the, at the heart of that is this platform Rovo. So. Like almost before we kick off, why don’t you tell me like what Rovo is and why it’s going to be useful in terms of those practical developer workflows.

Kun Chen (01:06)
Yeah, yeah, totally. So Atlassian has Rovo as our AI agent platform and product. And we actually have a specialized version called Rovo Dev, which is for developers to use. And I think that’s probably the relevant piece for today. So we started building Rovo Dev since about two years ago. First for Atlassian ourselves, for Atlassian engineering teams to use, we were thinking about how can we adopt AI.

to improve our productivity and just help everyone get more done and help us avoid some of the toil and boring work, right? So we started building Rovo Dev for ourselves and we really, we did quite a bunch of iterations to get to a place where we hear a lot of people in Atlassian says that they really love it. And then that’s where we decided that we should make that a product for our customers as well because we know a lot of Atlassian customers are actually software teams.

So we thought Rovo Dev can be really useful tool for them and valuable products. The angle that we look at this is that at Atlassian, we know that a lot of engineering teams, operate in a professional setting where we work as a big team. And big software teams, we often have very unique pain points and problems. Some of the problems are like we have repetitive work tech debt right. These kind of things that exist in a

professional software projects. We also have security tickets. We have feature flags to be cleaned up. There’s so many different things that are in our day-to-day work that developers don’t necessarily find very joyful to work with, right? So we want to get AI to do those boring work for us so that we can focus more on the creative and joyful work. So that is the angle. And we think there is a lot of room for AI to play a role in this kind of, this kind of value props. yeah, that’s the main angle we’re going after. We’re going the reason we’re going about this.

James Governor (03:06)
Awesome. Do you help with like filling in, Jira tickets? Is that, is that part of the, part of the use cases?

Kun Chen (03:12)
We at Atlassian definitely we have Rovo helping with that. if you go to Jira, you look at a Jira tickets, there is actually quite a bunch of tools that can help you improve the description of the Jira tickets and enrich the content. It can actually retrieve context from surrounding projects. So it can look at relevant conference pages and other documents to find what some of the acronyms mean.

what some related projects are, how they talk about certain things and pull those contexts into the Jira ticket so that the Jira ticket itself has richer requirements already written in. So whether it’s a human or an AI agent that’s taking on that ticket, they will have more information ready to go.

James Governor (04:01)
Okay, so as you say, reducing the toil. The reason I think it was interesting coming out this, thinking about it as context engineering is obviously there’s a lot of hype about sort of vibe coding and this idea that you’re gonna turn up with a green field and you’re just gonna like, you know, one shot a prompt and you’re gonna end up with this sort of amazing application. I mean, that doesn’t really reflect the lives of a team that’s actually building software at scale, does it?

Kun Chen (04:27)
Right, So Vibe coding and building professional software is very, different. So that’s what we are seeing in the market right now as an interesting phenomenon, which is that a lot of the coding tools are optimized for individual use cases. So if you look at how people demo a lot of the other AI coding tools, they build a, they one shot a Minecraft clone or something like that.

a vibe coding app, which is interesting and makes really good demos, but it’s just very, different from how a professional software team builds software, right? So when we look at professional teams, we see so many pain points that are just different from how individual developers and hobbyists experience these developments. And that’s why we thought there is a big void and a big gap to fill.

And Atlassian is the company that should be doing this. We are the teamwork company. And we have a lot of the context about your projects, your organization, how to do things in your company, things like that. So that’s the context that we think will really help address some of the unique problems that exist in teams, but not with individuals.

James Governor (05:40)
Let’s start a bit with Atlassian and Atlassian, how you use the tool internally. What are the different data sources that you actually are bringing context in to help a developer understand the task or the code base they’re working on at any particular time?

Kun Chen (05:53)
Yeah, that’s a really interesting question. So I think the obvious ones are from Atlassian product itself. Jira, for example, is a really widely adopted tool for tracking projects. So one interesting thing that we have observed being useful from Jira is look at similar issues. So when we assign a Jira ticket to Rovo Dev to solve, for example,

One of the first things we do is that we look at your Jira project as a whole and see are there similar issues that have been solved in the past. And because Jira has connection with GitHub and with your whatever source control you’re working with, we can actually look at what’s the PR, what’s the pull request that developers have already merged to solve those past issues. And when we found that, we use that as context.

to inform how we should solve this new problem because for especially for repetitive problems, this is extremely useful, right? When you know how it’s been solved in the past and that’s actually mirroring how we a lot of our human developers work as well, right? When we are just onboarding to a code base or we’re solving a bug that we’re not very familiar with, we will look at how this has been solved in the past. So that’s one source of context that is very, very interesting.

And other sources of context, we also, we try to cover the entire software development life cycle. So from planning all the way to deployment. So we pull context from Jira Confluence Google docs as well where the project definition product specs are usually are. We pull context from the code base itself. We pull context from past pull request. We also look at CI/CD pipelines.

James Governor (07:40)
So by the way, mentioned pull requests a couple of times. Is this like Bitbucket only or are you supporting GitHub? Like what does that look like from a Git support perspective?

Kun Chen (07:41)
Yeah, so Rovo Dev as a product is agnostic to the source control. So we work with GitHub and Bitbucket both, not just Bitbucket.

James Governor (07:59)
any GitLab integration or?

Kun Chen (08:01)
It’s on the radar, but there’s quite a bit of work to do to get there.

James Governor (08:07)
Okay, okay. Sorry, keep going. that the, you talked about the build systems as well.

Kun Chen (08:13)
Yeah, so the source control, the build system, the CI/CD pipelines, what are some of the pull requests that are failing in the CI pipeline. And we also look at deployments and incidents as well. So when an incident happens, what we usually do is that a human on-call person will jump in and start to analyze the root cause.

But analyzing the root cause is actually quite time consuming in some cases. So in that process, we also use Rovo Dev to try to pull context from other different places and help with the analysis of the root cause. And then after the incident is closed, for example, we’ll analyze everything and look at, are there some learnings that we should feed back into those other places? So one thing that we started doing was that

after a post-incident review, when we do the post-mortem and analyze the learnings, if we found there are some patterns in our code that should be avoided, what we do is that we feed that learning back into the memory of Rovo Dev so that next time Rovo Dev reveals a pull request, it will point out that problem. It will say, a big warning message. Hey, a similar code has caused an incident before. Are you sure you want to do this?

So those are the things that we found really useful, like when Rovo Dev pieces together all these different kinds of contexts. And these are the use cases the vibe coding tools will never address.

James Governor (09:48)
Okay, so, well, we’ll get to that. But so one thing that strikes me, so we were talking about Rovo Dev. Could you do me a favor and like for other people that are listening to the show, because they may not be so familiar with like Rovo and what the interfaces are. So it’s like you’ve got search chat and then the agents talk a bit about how Rovo Dev fits into that structure and what the interface looks like as well.

Am I, do I need to go to a separate interface to use Rovo Dev? You know, is it, are you meeting the developer where they are? So I think those two questions like, do you know, what’s, what is Rovo about more broadly? And then drilling a bit into like your specific tool set and, yeah, like whether you meet the developer in their existing tool chains or whether you’re taking them to a new place. Cause you know, Atlassian obviously huge,

sort of admin community, huge community around Jira, but you’re not necessarily the place that all developers are immediately like, yeah, I wanna go and use this Atlassian tool. So if you could walk me through some of that, that would be amazing.

Kun Chen (11:01)
Yeah, yeah, totally. So on a high level, Atlassian Rovo is our AI brand. And Atlassian Rovo has three main pillars. One is search. We have a unified search. Rovo can search across all your knowledge sources. If you connect your Slack, your Google Docs, and SharePoint, all these different data sources connected, Rovo can search across all of them. So there’s Rovo Search. And then there’s Rovo Chat. Rovo Chat is an AI agent you can talk to.

And you can talk to Rovo about everything across your knowledge base. You can ask questions, can summarize documents, you can get some work done. And then there is Rovo Agents. So Rovo Agents is a platform that allows people, customers to build their own custom agents. People can customize the prompts and the tools that the agent can use to get very specialized tasks done really, really well.

So that’s Rovo on a high level. Rovo Dev, however, is a specialized agent. So it’s one agent designed for developers. So the way you interact with Rovo Dev is a little bit separate from Rovo because Rovo, use that in Jira, in Confluence in the browser. But Rovo Dev, because it’s designed for developers, we know that developers, we work in very unique ways. We have our IDE, we have our local terminal, we have our pull requests.

So the places where Rovo Dev has to show up is a little bit different. So the way we designed Rovo Dev is to meet developers where they are. So we know developers use the terminal a lot. So we have the Rovo Dev CLI that is a command line app and people can use that in the terminal. We also have our VS Code extension because we know people, a lot of developers, they do their work in VS Code. We also have Rovo Dev in GitHub in Bitbucket to review pull requests because we know that’s where developers spend a lot of time.

James Governor (12:49)
Uh-huh.

Kun Chen (12:56)
So yeah, that’s how Rovo Dev works on high level.

James Governor (13:00)
Okay, great. Yeah, I think that makes sense because like I say, I think it’s important to meet developers where they are. If you’re trying to get them to go to you, it’s not always so easy to pull that one off. Actually, me see, you’ve fairly recently joined Atlassian, is that right?

Kun Chen (13:19)
Yeah, I joined the company two years ago.

James Governor (13:21)
Okay, so two years. I mean, that’s not that recent. In AI years, that would be probably 20 years. Like, what was the… Why was Atlassian an exciting place for you to go in terms of like… Yeah, what were some of your drivers? Like, why were you like, this is where, you know, I’m in it. I’m going to be doing something that’s AI related. Yeah, what brought you to Atlassian?

Kun Chen (13:44)
Yeah, yeah, that’s a good question. So I think when I joined Atlassian, that’s about the time when AI became a really clear opportunity for everyone. We saw AI being really more and more capable. We saw this upside that it can drive. So I was looking across the industry and think about what are some of the companies that are in really good position to make an impact with AI. And I saw Atlassian being in really unique position.

Because if we think about AI models, they are trained from public domain data, right? So they have the knowledge from the public domain. They know a lot of the general knowledge. But when we try to bridge AI models trained from public domain data to real industrial tasks, there’s often a gap in private domain data.

in the enterprise knowledge base in how people actually do their work. Those data is usually not available, not visible in the public domain. So a company like Atlassian is in this position where it’s the product where people write their context and knowledge base. So this kind of company is really in this unique position to bridge that gap. So I saw this big opportunity and that’s what I have been focusing on.

So since I joined Atlassian two years ago, Rovo Dev was the product I started working on. And the angle we are going after is exactly to bridge that gap, to bring frontier AI models trained from public domain data to solve real world problems in a professional team setting.

James Governor (15:17)
Okay. And, you know, with that in mind, there’s sort of a related then, okay, you’ve got the models that are trained on everything that’s open. know, are you in a position to do perhaps training with smaller models that things that are closed? So your customer’s code bases and the way they develop software, are you building tools that fit into that category?

Kun Chen (15:38)
Yes, yes. one clear example is our Rovo Dev code reviewer. So when we do code reviews, when we apply AI models to do code reviews, the first thing we noticed was that the public domain models, they don’t know what are good code reviews. They have seen a lot of code reviews in open source repositories, but those code reviews, some of them are good, some of them are bad.

So how do we actually get to the most meaningful comments? How do we focus on the most valuable problems, right? Instead of putting out a lot of noise. When we directly apply large language models to review a pull request, there is so much noise. A lot of the comments are very trivial and not something that we would find very helpful. So what we ended up doing was that when we look at our own data sets, Atlassian engineers,

we noticed that there is a very clear signal for whether a code review is useful or not. We look at, did the author of the pull request actually address that comment? So if a comment is really helpful, usually the author will make a code change to address that. So we look at that and we use that as a signal to differentiate from noisy comments versus useful comments. And now we have this data set that is from our private data.

and it’s not available anywhere in the public domain, how do we leverage that to improve the value of Rovo Dev? So what we did was that we look at the data sets, we use that to train a model. It’s not a large language model, it’s a traditional predictive ML model to classify comments into noisy versus helpful, and we can assign a score to each comment. So after we use large language models to generate a bunch of comments, we run this predictive model.

to classify which comments are more likely to be found useful. And we use that to reduce the noise level. And that’s improved our precision by a lot.

James Governor (17:35)
Okay, that’s awesome. Super useful. So I guess for me, I think one of the interesting questions is, as lead architect, what were some of the key engineering decisions that you had to take maybe early on and throughout the process of building Rovo Dev? what, in terms of, I guess, some tough problems you had to solve. But also I’m always interested in basically like the engineering choices that you had to make within the constraints that you had. So yeah, a little bit about the engineering decisions you made.

Kun Chen (18:12)
Yeah, yeah, that’s an interesting topic as well. I think a lot of our decisions are around the broader topic of context engineering. So maybe we can dive into that a little bit. So context engineering, we started working on this, it’s not called context engineering yet. People call that in many different ways. At first, it was prompt engineering. And then people realized, oh, it’s not just the prompt. It’s how we orchestrate the context. So we look at context engineering in three layers.

There’s how does the agent orchestrate context? And then there’s how do people write their prompts? And then there’s a third layer I’ll talk about in a bit. But let’s start with the first two. How does the agent orchestrate its context? There’s so many interesting problems and decisions we have to make over there. So at its core, the way AI agents work is that you feed a whole bunch of tokens in as input.

and the agent, the large language model, will spit out some output tokens. The quality of the output determines the quality of the outcome. And the quality of the output tokens are pretty much heavily influenced by the quality of the input tokens. If we ignore the model quality for a bit, the models are improving. So we focus on how do we improve the input tokens? How do we compose the tokens in a way that will be very

helpful for the agent to arrive at a useful outcome. So we look at this and this is like the technical aspect of how an AI agent work. There are so many techniques to optimize how the tokens are laid out and composed. For example, if we let AI agent solve a really complex problem.

the agent will read a lot of files and will try a bunch of things. And that will blow up the input tokens. It will ask the input tokens to be so large that it no longer fits in the context window. So what do we do then? There are many techniques to handle that. So we started with pruning context. We have to drop something, right? So what a lot of AI agents do is that they drop the context from the middle part of the conversation.

So because when we look at large language model’s attention mechanic, it heavily weights the beginning and the end of the context window and easily ignore the things in the middle anyway. So we started to drop some of the context in the middle. That was like the first naive approach. And then we started to realize,

James Governor (20:37)
To be honest, by the way, that actually is just like humans. We hear the beginning and we hear the end, but sometimes in the middle it’s a little bit hazy. yeah, that’s interesting to me. That’s certainly in storytelling, that’ll get you every time. So, okay, so the pruning strategies, that’s interesting.

Kun Chen (20:58)
Yeah, yeah, very similar, a lot of analogies. So yeah, we started dropping the context in the middle. But then we realized, oh, instead of fully dropping everything, we can actually summarize them as well. So there’s the technique of summarizing some of the context. And then we also identified patterns where some of the context is just not helpful. For example, the AI agent sometimes will read the same file multiple times. So the older parts of the file are most of the time not very helpful.

So we can remove some of those as well. So a lot of these techniques accumulated and started helping with how the agent orchestrate context. And then we also, the other thing we noticed was that MCP became really popular, right? So a lot of people, they connected various kinds of MCP servers to their agents to help them do different kinds of things.

James Governor (21:47)
But by the way, I’m actually intrigued by that as a, mean, you’re trying to build a product, you’re, guess, so you would have been like, you know, like within a few months of launch, really, you’ve got a direction and then MCP drops right in the middle of the project. So yeah, like how did you respond to that from an engineering perspective?

Kun Chen (22:10)
Yeah, very interesting topic. we started out, when we started out, there was no MCP. And the way we think about enhancing the agents was through tools, right? We built our own tools to help the agents navigate the file system and everything. But then we noticed that Anthropic started working on this MCP protocol and we realized, if this becomes a standard, it will really help because the community can build more and more MCP servers.

James Governor (22:21)
Mm-hmm. Yep.

Kun Chen (22:39)
that we can adopt and avoid repeatedly building the same thing. If we build something useful, we can also publish as a MCP server. So at Atlassian, we have the Atlassian MCP server that can be used by other coding agents. So we realized the value of the standards and we jumped in very quickly. So we pretty much evolved our architecture to center around MCP being the single protocol to extend the agents.

So even with our own agent tools, we started building them as MCP servers instead of native tools. So this allowed our agents to become a really extensible system, which later on enabled our customers to extend our agent as well. So yeah, we pretty much like jumped in right away. It did take some work to embrace it, but I think it paid off. The fact that we jumped in early.

allowed us to adopt the protocol very quickly. And we also work very closely with Anthropic to give them feedback on the protocol, to design our agents, design the future version of our agents with what they have on the roadmap in mind.

James Governor (23:49)
Yeah, mean, we have not seen many technologies standardize. I mean, we’ve never seen anything standardized that quickly. was really, really interesting from an industry, know, someone that’s watched the creation of standards over the years. The rate at which the industry consolidated around that was pretty impressive. And I guess from your perspective, it’s good because it’s a good bet.

in that respect. It wasn’t a huge risk. So you mentioned Anthropic and one of the topics, mean, we couldn’t really have a conversation about AI and developer tools without mentioning models. So you must have built quite a lot of experience and a changing context in terms of models over the last couple of years that you’re building the tool.

Kun Chen (24:21)
Yeah, absolutely.

James Governor (24:47)
Where were we? Where are we from state of the art perspective? Are you using multiple models? Is Sonnet good enough for everything? Have you started playing with Gemini? Where are you on the model journey?

Kun Chen (25:02)
Yeah, yeah. So that’s that’s a rabbit hole in itself. So when we started out, it was GPT 3.5 turbo. It’s a very, very different world back then compared to now. So back then, I think when we first worked with this large language models, there was no native support for even function calling. So we had to tell the agent, we have a few functions you can call and we parse the output from the agents to do the function calling.

And then we started to have GPT-4, which was a lot more intelligent and can do more complex things. But we still notice a lot of gaps in its ability to perform developer tasks. So one example was that in order to make a code change in a large file, the agent has to produce an output that can tell us where do we want to make the edits? What is the new content? Right? Just that.

little part of the job is very hard for models to do. So we tried various different kind of ways to design the tool that we give to the agent that allows it to do it accurately. So first thing we tried was to ask the agent to tell us the line numbers. Which lines do you want to edit? And the agent always get that wrong. So in the large file, the agent will count which line it should be, and it’s always wrong. So that’s one thing we tried and didn’t work.

The other thing we tried was that we asked the agent to produce a diff. So, you know, the git diff format has a certain format. And the agent gets that wrong as well. It will hallucinate the diff in a way that makes it invalid. And then we tried to get the agent to do a search and replace kind of operation. So we say, you give us a string you search for.

and you give us what is the replacement for that string when you found it. And we worked with that for quite a bit. So one thing we noticed was that, so that worked better than the other approaches, by the way. But what we noticed was that when Anthropic rolled out the Sonnet 3.5 v2, think, that was the version that really got that part right. So that model, I think when they trained the model,

they trained with a bunch of tools in a RL environment. So the agent was really good at using those tools. So one of the tools was making an edit to a file, and that was exactly doing the search and replace operation. So they trained the model with the tool in mind, so the agent is now really good at it. So that’s why I think starting then, Anthropic had been the best coding model for quite a while.

So a lot of the coding use cases, people gravitate towards Anthropic’s models. I think it’s because they train the model with a lot of the developer tools in mind. So that was for quite a few months. But I think recently, starting from GPT-5, Gemini 3, the gaps are closing. So these other OpenAI, Google, they are catching up as well. They realize, the models have to be good at coding.

They have to train with certain tools and the models are getting closer and closer. I think we see the difference being smaller and Different models now have certain niches. So for example Google Gemini, we noticed that it’s really good at front-end and Gemini overall is just really good at multimodality Analysis is good at understanding image generating image

and using that as a format of inputs to inform its tasks. So some of our workflows, we use Gemini model to do this kind of multimodality parsing. So we do, to answer your earlier question, we do use multiple models, but on a high level, we let the users choose as well. So users who are using Rovo Dev CLI,

We have a models menu where people can choose what model they want to use because we realize the need is so fragmented. Some people, they have found their own favorites for their own tasks. So we do allow people to make the choice themselves.

James Governor (29:18)
Okay, cool, yeah, that makes sense. And what about tokens? mean, one of the big questions here is like, know, everything’s great until you have to start paying for things. And, you know, I guess it’s, is it part of the engineering that you need to do in order to enable these use cases without burning up all your tokens? Like how do you, yeah, how do you support users that are like, yeah, we’d love to do more stuff, you know. with Rovo, but on the other hand, we’re going to burn up all our tokens. How do you manage that? Are there any optimizations and how do you see that playing out as well? Because I think that’s one of the key things in market right now

Kun Chen (30:00)
Yeah, yes, totally. when we think about this, think our principle is that we first focus on the quality of the product, and then we worry about the cost. Because we acknowledge that it’s much better to have a valuable product that is really good, but it’s a little pricey, than having a useless product that’s cheap. So we want to make sure

Whatever we build is useful. That’s the first thing we prioritize and we make sure we must deliver something that is valuable. On top of that, once we achieved, once we saw that the product is valuable for certain use cases, then we start to think about how do we optimize for efficiency. And efficiency is not only cost, but also latency as well, because people don’t want to wait forever, right? So we look at various kinds of techniques to help us reduce the cost and latency.

And this goes back to context engineering of how do we orchestrate the context in a way that can allow us to maintain the same level of quality but use less tokens. So one thing that we did was very interesting was that we noticed when we give the agent a lot of tools, the tools themselves will occupy a lot of tokens. So especially when users connect it to their own MCP server, some people connected like 20 MCP servers.

and there’s more than 100 tools that are given to the agent at the same time. So if we always put all those tool descriptions in the system, in the tokens, then the agent, it will consume a lot of tokens on every single request. So how do we optimize that? One thing we did was that we built a compression layer to basically reduce the amount of information we have to give to the agents.

we reduced all the different tools to only a few. One is to dive into a certain tool schema so we don’t have to show all the schema upfront. We have a tool for it. And then the other tool is to invoke the specific tool. So we kind of normalized a lot of the MCP server’s tools to reduce them to only a few.

And through that, we reduce the token usage a lot. And for every single request, we no longer have to put up all the different MCP tool descriptions in it. That’s just one example of how we optimize this.

James Governor (32:15)
Okay, awesome. I mean, I’d love to hear more. so let’s get back then, I guess, a couple of questions. So we’ve gone through sort of the engineering decisions, we’ve talked about cost, we’ve about models. Just to go back to, if we think about the use cases that you’ve

there any difference in terms of what Atlassian found itself was going to be most interesting. Now you’re actually at the exciting part of customers beginning to use the tools. What are the use cases that they’re most excited by in terms of that initial user feedback?

Kun Chen (32:52)
Yeah, yeah. So even within Atlassian, what we realized was that the use cases are very fragmented. So when we first started out building Rovo Dev, we actually built a very specialized version of it, which is Rovo Dev in Jira. So people can come to Jira, look at a Jira ticket and say, assign this ticket to an agent to solve it and raise a pull request for me. That was the specific use case that we first built. That was the first one.

However, what we learned through internal usage was that people have really fragmented needs. Some of their tasks are not even in Jira. And some of them like to do things in the local environment. Some of them like to use agents to help them with fixing errors in CI/CD, various kinds of things that don’t necessarily start from a certain workflow. So we very quickly pivoted our approach to say,

Instead of building a very specific product experience, we built a primitive that can be used in many different ways. And that’s why we arrived at Rovo Dev CLI, which is a command line app. People can invoke that from everywhere. People can use that for pull request reviews. They can use that to fix the CI/CD errors. They can use that to do whatever task they have in mind.

So that was one of the biggest breakthrough internally to get more and more usage. Now, when we look at how customers are using this and how that differs from Atlassian, I think when we look at customers, we realize there’s a wide range of customers with varying team sizes. So small teams and big teams have different problems. One of the problems that’s interesting in big organizations is that

Oftentimes, there are use cases that require the company to enforce a certain standard across many different repositories in the company. This is very different from smaller companies who usually work on a single repository. In a single repository, you can put your rules and everything in the memory files. People now have Cursor rules, claude.md file, agents.md file.

James Governor (34:38)
Yeah, 100%.

Kun Chen (34:57)
You can put everything in the repo in those Markdown files. But it’s a very different story when you have a hundred different repos across the company worked on by 20 different teams. And how do you enforce a certain standard across that? That requires a different system. So that is where some of the unique value props we are trying to build in Atlassian’s products, because we know that our customers are in this kind of teamwork professional setting and we’re building solutions for it.

James Governor (35:26)
Yeah, you’ve certainly got some pretty gnarly customer use cases out there. Your customers, they’re plenty of legacy.

Kun Chen (35:33)
Yeah, so there are legacy code bases. are many customers who haven’t even started adopting some of the approaches that we take internally in Atlassian. For example, running CI/CD, running test in every single pull request. So the practices are very different. when we talk to those customers, we try to understand what is the unique situation they are in and try to make sure our solution will work for them.

And so far, I think it’s worked out well where we understood the different kind of use cases. So some customers, they have a legacy code base and their need is to migrate that to modernize some of the older code bases. Now, how do we do that reliably? One thing that helped was our Rovo Dev in Jira combined with automation. So Jira has this automation products, right? Where you can build rules and build repetitive workflows.

So we actually put Rovo Dev into that automation system. So you can say across this 100 Jira tickets, run the same Rovo Dev prompt and get them done. So that is very convenient for this kind of large refactoring tasks. people, what people do is that they basically break down the refactor into many, different Jira tickets and have agents work on them in parallel.

James Governor (36:42)
Mm-hmm.

Kun Chen (36:54)
So Rovo Dev has helped a lot on that front. And that was the use case that we understood only from talking to customers who have this kind of a need.

James Governor (37:01)
Okay, that’s awesome. So, one last question, big kind of open-ended question, can I?

One of the points you made was how much progress we’ve made in the last sort two years from our models, all of the surrounding scaffolding, I the arrival of MCP, enabling us to tackle a wider range of problems. What’s the future for, I guess, this sort of context engineering broadly and your product and Rovo Dev? more specifically, like how do you see things playing out into you know into the next couple of years?

Kun Chen (37:39)
Yeah, yeah,

very interesting topic. So I think there is a few trends that I’m looking at. One is that if we think about how coding agents have evolved in the past two years, we started with Copilot code completion, right? It was very fast. It helped us complete one single line or a few lines. That was very fast. But then we started to have products like Cursor that have next edit suggestion or agent mode.

that is a little bit slower, but gets more work done. Then we started to have Claude Code these kind of autonomous agents and OpenAI Codex that runs in the cloud autonomously. That’s even slower than Cursor. Slower, I’m not saying that in a bad way. I think the trend I’m seeing is that we delegate more and more responsibility and autonomy to AI agents and expect them to get more done. I think this trend will continue.

So the agents will become even slower than what we have right now, but will get more done for us and get higher quality outcomes. So the agents, when they come back, it’s a well-tested, well-proven change that I know is safe to merge. So I think to get there, the agent has to be better at testing. agents right now mostly run unit tests, but how about end-to-end tests? How about really showing me what has been tested so I have confidence to merge the PR?

So I think agents will evolve on this dimension. They become slower, get more done, but it’s higher quality outcome. Another trend that I’m seeing is that I think the bottleneck of software development is going to shift from coding to non-coding tasks. This coding is now so fast, we can spit out code, a lot of code, very, very quickly through these agents. So what we are noticing right now is that engineers, can build so many prototypes.

But then the question becomes, what should we build and what should we ship? What’s going to be valuable to customers? What are customers expecting from us? We can’t just ship like 20 different features that are not cohesive, that are bloating the products. We have to be smart about what to ship. So now the discussion started shifting into product design and planning.

So one thing that I think will happen is that engineering as a discipline, as a role, we will start to adopt more and more product skill sets. So we have to think about what customers want, how do we maintain a good product design? So I think the talent stack will start to collapse a little bit. And we already seeing that PMs are building prototypes now.

We’re seeing that designers are merging pull requests as well. So at the same time, we will see engineers start to have more product thinking and design thinking. I think that’s another trend. Along the same lines, think non-coding tasks, a lot of the, if code generation becomes so much faster, code review is going to become a bottleneck. So I expect a lot of innovation to happen in the code review space as well to allow us to review agents’ code more efficiently.

James Governor (40:19)
100%.

Kun Chen (40:42)
and also review each other’s work in a different way, in a more AI native way. Yeah, I think those are the interesting things that I think will start to happen in the next couple of years.

James Governor (40:52)
Amazing. Well, look, this has been a great conversation. Great to hang out with you. lot in there. that’s a good show. Thanks so much. So I’m James from RedMonk. This is Kun from Atlassian. And yeah, we’d love to know what you think. Any feedback? If you’re listening to the podcast, definitely share it with your friends. You should definitely subscribe. So thanks a lot. And yeah, that’s a wrap.

Kun Chen (41:18)
Awesome. Likewise

More in this series

Conversations (106)