James Governor sits down with Anush Elangovan, VP of AI at AMD, to dig into the fast-evolving world of local LLMs, edge hardware, and the future of AI-powered developer experience.

They cover:

Why local LLMs are exploding in adoption -Privacy, data sovereignty, and on-device intelligence
Running 120B-parameter models on AMD Strix Halo laptops
The software stack Anush uses: ROCm, Llama.cpp, Ollama, ComfyUI, PyTorch & more
How AMD is approaching AI-driven developer tools, coding agents, and predictive ops
The role of custom silicon, inference-optimized chips, and next-gen laptops
What ROCm actually is (and why developers should care)
If you’re curious about the future of on-device AI, developer workflows, or AMD’s AI strategy, this is a must-watch.

This video was sponsored by AMD.

Rather listen to this conversation as a podcast?

Transcript

James Governor: Hey, it’s James from RedMonk, and we’re here today with Anush Elangovan, VP of AI at AMD. We’ve got a topic here, which I think is kind of interesting because a lot of us just think, oh, yeah, GPUs in the cloud, we’re never going to be running these locally. I mean, this is definitely a cloud-based deployment. It’s not something that I need to run. It’s something that someone else will run for me. Now, Anush, I think you bring maybe a slightly different view, a more nuanced view on the idea that it’s just people running models in the cloud. So, like, what are you seeing? Is it just nerds and tinkerers, or are there use cases where people are going to be like, actually, yes, I need to run this locally?

Anush Elangovan: Yeah, I think the local market or the local LLM market is a pretty big area for us to focus on. There are a lot of customers that use local LLMs for things like co-pilot, things like summarization, presentation creation, especially when you add in an axis of privacy, and you want to be able to have the ability to ensure that your data is local to your machine. There is a big push to have them all, you know, all of that done in a local machine. There are a set of models that will always be front here and up in the data centers. That is definitely going to be, you know, like where the tip of the sphere is. But then you can think of the intelligence trickling down to your local LLMs and your local machines pretty fast and being able to get to the ability to run highly intelligent models on your laptops and your phones very quickly.

James: Laptops and phones. Well, let’s stick with laptops for now. I’d be kind of interested in what models do you, have you come across that you think are interesting recently? Obviously, it’s breaknecks. There’s new models every week. You know, every time we think, oh, yeah, okay, this is an interesting model, something else will come along. But what are you running? What have you run locally that you’ve been like recently? Like, wow, this is amazing quality. And what was the use case for that?

Anush: Yeah, I think the most important one that I have so far is DPD OSS, the 120B. It actually fits very well into your Strix Halo laptops, right? Like a Strix Halo laptop, it has 128 gigs of memory. And so you can actually fit the entire model and be able to run it very fast on your Strix Halo desktop or laptop. Your Strix Halo being your Ryzen Max 395+. Yeah, and the intelligence of the GPT OSS model is pretty significant compared to like what you can, you know, what you can get from chat GPT, right? Like if you compare from chat GPT 5 to what GPT OSS 120B provides, it’s pretty comparable. And that’s kind of like the same thing that we expect to get, which is you will be seeing a lot of frontier models distilled with a student-teacher kind of like teaching paradigm. So you get smaller models that are as intelligent as the top frontier models, but don’t necessarily take as much space.

James: Okay, so one thing I’m always interested in is like, what’s your stack? So when you’re playing, you know, what do you use in terms of like, I want to try a new model. What are you running locally on your machine that enables you to be like, okay, now I can experiment with this new model?

Anush: Yeah, so I, of course, I, you know, working and making sure ROCm is in a good spot is like number one priority for me. So I have a Strix Halo laptop, a framework laptop that are both in Windows and Linux. I have ROCm up and running. So once you have ROCm running on both Windows and Linux, then the, you know, Llama.cpp, Ollama and ComfyUI are like the go-tos for the top three items, including text to image and text to video. And, and then there are a few other ones that are a little more like, you know, you build on PyTorch, right? Like you have PyTorch now, so you can do whatever you do with PyTorch and Hugging Face. So that allows you to do a little more general purpose stuff. So, yeah, that’s what’s on my desktop right now.

James: Okay, cool. And, and in terms of the hardware that you’ve got, and so we’ve talked a bit about the software stack, tell me a bit about the, you know, I think the intersection of, of models and AI driven software development. How are you responding to the needs for a really top-notch developer experience? Because it feels like the premium is so high right now on absolutely. We’ve got to deliver amazing developer experience. Where are you investing in order to support that?

Anush: Yeah. So one of the best use cases for AI itself is like anything to do with, you know, developer experience coding. And you’ve obviously heard about Vibe coding and, you know, let’s just say we’re investing in Vibe coding with guardrails, if you will. So it’s not fully like, Hey, it’s fully agentic. It’s going to go and go write everything and replace a software engineer tomorrow. But it’s getting to a point where it’s intelligent enough, where you can, you know, describe concepts and ideas. And, and that gets brought into a fruition with all the knowledge and reasoning power that’s built into LLMs. So we’re investing very heavily in that. It’s everything from kernel generation for high performance kernels and the ability to like parse log files, predictive healing. Especially when you’re doing very large clusters, you have the ability to like look at the logs of like terabytes and say, Hey, you know, I see something awkward or wrong, or, you know, there’s something like this shouldn’t be really happening. And, and some sort of like analysis like that is very good for an AI agent to kind of like to on a synchronicity and autonomous.

James: Okay. Okay. In terms of individual developers, or in terms of, you mentioned sort of privacy, where we’ve got this notion of, of maybe we’ve got some data that we don’t want making its way into the cloud as a significant use case. Is this, is this a consideration for individual developers? Are you talking about sort of a digital sovereignty sort of angle here that we should be considering? And is that one of the opportunities, is that one of the, I guess the places where people are going to be running models locally, because from their perspective, it’s absolutely a data sovereignty concern.

Anush: Yeah, I think it’s both, both of them. Right. So you, you want to have, I mean, there, there’s a good chunk of folks that just want it like for individual privacy. Right. And that’s just, Hey, I don’t want my photographs. I don’t want my things to be out there. Or my emails to be used in training. You know, and the digital sovereignty piece is also important because that’s when, you know, sovereign nations want to make sure that they have their data guarded in ways that relate to local laws, et cetera.

James: Okay. That makes sense. So look, I guess to sum up, what does the hardware look like? I mean, you know, if we’re going to talk about, so you mentioned, you’ve got like a framework machine there, like what sorts of, or, you know, obviously the hardware you’ve got is high end. And you know, you’re, it turns out that you, Anush has plenty of access to the best rise in hardware. But just more broadly, um, what, you know, are we seeing people like running their own clusters? You know, is this more like something that I can do with, you know, you mentioned the laptop. Is this something that I can do on a laptop or what, you know, what are you seeing? How do you think that will evolve over time? And I’m still really interested in this, running things at the edge. So yeah, how does this evolve over the time? What’s the hardware going to look like?

Anush: Yeah, I think, I think there’s a combination of like, there’s always going to be GPUs for the tip of the spear, like programmability and performance kind of like matrix. Right. But then there is an increasing push to specialize. So where they take the ability to build this, like a custom model. But then you use inference specific hardware, right. So that you are customized for that specific model and it’s good. It’s good for… what do you say? The inference costs, if you will, right? Like, they take the best and then they say, okay, fine. We’re going to make this efficient to serve. And that requires either customized silicon or inference specific silicon. And then the next one is when they take those models and see how they can actually target hardware, right. Like laptops and mobile phones even, right. So it’s, I think the frontier will be pushed on all sides of the AI landscape, from the large clusters, which we are doing. Like right now we’re getting to like tens of thousands of DPUs, close to a hundred thousand DPU clusters. And then we’re getting to these very large footprint laptops that have custom silicon. That is I think, you may be aware that we call them AI engines or XDNA, and they’re spatial data flow machines that are very, very efficient. And then it’s very high tops per watt, but is customized for specific workloads. And it takes a little bit to get that, but then you get a very good power profile when you deploy on those AI.

James: Okay. We’re going to wrap up, but I got one last question, actually. Not everybody in the world knows what ROCm is. And I feel that in this context, I should probably have asked that question. So to summarize, what’s ROCm and why should anybody care?

Anush: Yes. ROCm is an open source AI software stack that is pioneered by AMD. All of AMD’s silicon uses that for enabling AI. It is a lightweight and inclusive ecosystem. And by that, I mean, you could join in and say, Hey, I would like to add the latest, technology to it. You’re welcome to come and do it. It is fully open source. It’s yours to play with, tinker with, and make better for everybody.

James: Okay. ROCm, you can run things locally. Thanks very much, Anush. And, thanks all for joining us. Yeah. Comment. Let us know what you think. Feel free to like, subscribe and all that good stuff.

This video was sponsored by AMD.