A RedMonk Conversation: Kate Soule on Granite 4.0 at IBM TechXchange 2025

Get more video from Redmonk, Subscribe!

In this RedMonk Conversation from IBM’s TechXchange conference, RedMonk’s Stephen O’Grady chats with Kate Soule, IBM’s Director of Technical Product Management for Granite about the newly announced Granite 4.0 family of large language models. Soule explains IBM’s strategic focus on small, efficient models that are open sourced under Apache 2, highlighting how these models achieve better performance than previous generations while reducing costs and improving latency through new architecture and training approaches.

This RedMonk video is sponsored by IBM.

Transcript

Stephen O’Grady: Good afternoon, good morning, good evening. I am Steve O’Grady. I’m the co-founder of Redmonk. I’m here today to talk to Kate. Kate, would you like to introduce yourself?

Kate Soule: Hey, everybody. My name is Kate Soule. I’m the director of technical product management for Granite, our family of large language models that we train here at IBM.

Stephen O’Grady: Awesome. So, Kate, there was an announcement here at TechXchange, Granite 4.0. And I think that the simplest place to start is can you take us through the individual pieces of announcements like what got announced today?

Kate Soule: Yeah, absolutely. So Granite 4.0 Is really a family of models. All these models feature a couple of key characteristics. There are going to be small in size. They’re going to be efficient to run thanks to a new architecture that we actually released. And they are all open source under Apache 2. There’s some other fun things that we’re doing on behind the scenes. They’re all ISO 42001 certified, but that’s the gist of it.

Stephen O’Grady: Okay, so that brings us to the next obvious question, right? So in the industry, there’s tons of time and tons of attention around these huge models. Tons of parameters, huge input. So why small? Like, what are the advantages to that from an enterprise standpoint?

Kate Soule: Yeah. You know, I think IBM in general has a portfolio strategy when it comes to generative AI and building with large language models. And we see a lot of advantage of being able to bring in small models that are fit for purpose, ultra efficient, to be able to reduce some of your costs, be able to improve your latency, and then pair those with much larger models like those from Mistral and Anthropic and others, in order to complete a workflow.

Stephen O’Grady: Okay, so one of the aspects of the announcement that caught my eye was the fact that it’s essentially faster but also cheaper.

Kate Soule: Yes.

Stephen O’Grady: So how does that work? Like how do you get to that point?

Kate Soule: So thanks to a number of different things, new architecture, new training data, our smallest model is more powerful and faster and more efficient than our biggest model from the previous generation. So really a lot that we did to improve the architecture, improve the efficiency is now coming out to play.

Stephen O’Grady: Okay. Makes sense. All right. So it’s 2025. So that means we have to talk about agents and agentic. You know that seems to be the sort of topic of the day. So as you think about Granite and you think about, you know, 4.0 in particular, where does Granite fit in that? Um, just sort of in that world, like how do you slot that in to help enterprises build these agentic workflows?

Kate Soule: Yeah, that’s a really great question. So again, when we think about how we’re building with LLMs, it gets really expensive. And it’s really kind of inefficient to always hit the biggest model in your arsenal for every single step in agentic workflows. Agents are perfectly primed to be able to divide up into multiple subtasks what you’re working on and divide and conquer. And so we really see Granite fitting in nicely into that kind of divide and conquer strategy, where smaller, kind of simpler tasks can be executed more quickly, more efficiently, at lower costs, and then leave the big, expensive models for the really complicated parts of your agentic workflow.

Stephen O’Grady: Okay. So you mentioned at the top, um, obviously Granite is open sourced, right? Yeah. So there’s been a ton of time, you know, for for folks sort of watching this, a ton of time and attention, looking at the intersection of AI and open source and where those two things come together, how they come together. Does it apply? How can it be applied? So from IBM’s perspective, you know, sort of as you think about Granite, where does open source fit? Like why is it important and how do you apply it in a way that is, you know, that sort of makes sense for AI moving forward?

Kate Soule: Yeah. You know, I think there are many different types of AI. There are big closed labs pursuing artificial general intelligence and are doing really amazing and exciting things. And there’s also a huge community that IBM is investing in, in the open source world that’s bringing some of the diversity of thoughts and best and brightest minds that just doesn’t exist if you’re building technology in the closed and we’ve seen the power of open source development with software. We see the power of it again with artificial intelligence. And, you know, we’re really trying to play across the entire stack, making the AI stack fully open.

Stephen O’Grady: And you know, when you’re talking to clients and when you’re talking to users, and more particularly for here at TechXchange developers, is that something that you know, they’re telling you is important to them?

Kate Soule: Yeah, I think there is a lot of importance to being able to take a model and just run it locally, kick the tires, no cost. You can run it on your laptop, you’re not paying for token, and you can build in a way that’s far more tangible than if you’re just hitting an endpoint.

Stephen O’Grady: Okay. So to to sort of close out here, you know, I think we want to sort of look forward. Right. You have the announcement here. You’ll have to sort of prime — I’m trying to remember the phrase, Generative…

Kate Soule: Computing.

Stephen O’Grady: Generative computing okay. Beautiful. So when we think about Granite and where generative computing comes in, can you explain what that is and sort of why it’s important moving forward?

Kate Soule: Yeah. So we’re here at TechXchange. We’re actually in a larger area that’s both Granite and generative computing working together. And generative computing is this idea that, you know, models are no longer just the weights. When we look at what some of these closed labs, for example, are releasing its models plus software, plus a lot of things going on behind the scenes to improve performance. And we call this broader idea of bringing software development more core and center into generative AI workflows, generative computing. Specifically, when it comes to Granite, we see a ton of value by using things like inference scaling, and running a small model multiple times to improve the performance, to improve the overall capability of these small models. And so there’s a lot of opportunity of this idea of using software engineering to do multiple things with a small model efficiently behind the scenes, instead of calling one really big model once to perform a task. And that’s what we’re kind of trying to accomplish here.

Stephen O’Grady: That makes sense to me. And with that, we’ll close out. Thank you so much for your time and attention. I am Steve O’Grady and Kate, thank you so much.

Kate Soule: Thanks so much, Steve.

Stephen O’Grady: Beautiful.

Links

Transcript

More in this series

Conversations (100)