Rachel Stephens interviews Niki Manoledaki (Software Engineer at Grafana Labs) on her work on the CNCF’s Environmental Sustainability TAG. They discuss the Kepler project, the SCI standard, and other efforts to help engineering teams understand the carbon intensity of their cloud workloads.
This was a RedMonk video, not sponsored by any entity.
Rather listen to this conversation as a podcast?
Rachel Stephens: Hi everyone, this is RedMonk Conversations. I’m Rachel Stephens, I am an analyst with RedMonk and with me I have Niki Manoledaki. Niki is a software engineer at Grafana Labs, but where I actually met and saw her first was on stage at the keynote at the most recent KubeCon. And so I was excited about Niki’s work in the CNCF ecosystem. But I also wanted to just hear more about what she’s doing in general. Niki, thank you so much for joining us.
Niki Manoledaki: Thank you so much for having me.
Rachel: Wonderful. Can you tell us, so I’ve heard about, I believe it was the sustainability and environmental TAG. I might have gotten those backwards, but you’re doing work in kind of the green space of the CNCF. And then I’d also love to hear about what you’re doing at Grafana.
Niki: Yeah, so at Grafana, I’m a software engineer in the platform team. In the CNCF, so we created the environmental sustainability technical advisory group, so TAG ENV, and quite recently, and we’re part of a growing grassroots movement in open source where we focus on carbon and energy monitoring, which energy monitoring in itself is not particularly new, of course, but attributing energy metrics to, for example, resources in the cloud and converting those to carbon metrics, those are new additions in the open source ecosystem.
Rachel: Very cool. And when you’re saying energy metrics, this is primarily power consumption by the compute resources. Is that what we mean?
Niki: Exactly. So their runtime power consumption, we’re also looking at overhead power consumption in data centers, but those are more difficult to get a hold of.
Rachel: Yeah, I have so many questions about the tools you use. Because when you’re saying doing this in the cloud and trying to figure out what cloud resources are actually consuming, how do you actually do that? Like, I guess in theory, you know where your workloads are running. And so you can kind of back things up. But like, just tell me more about the tools that are at people’s disposal to try to figure this out and how people are approaching this problem.
Niki: Yeah, so in the cloud, it’s a particularly difficult thing to do to gather energy metrics because, for example, one of the ways to do that, which Kepler uses, Kepler is a new sandbox project in the CNCF as of July of this year. What Kepler does is it looks at the running average power limits, I believe it’s called. The acronym is RAPL, which is an Intel technology that surfaces energy metrics. And what Kepler does is it’s either if you have access to RAPL, it will attribute those power metrics. If you don’t have access to RAPL, which in the public cloud, most users don’t have access to that because the public cloud providers don’t expose that metric through the hypervisor in virtual machines. So what Kepler does is they’ve created, they’ve trained, they have a pre-trained machine learning model and they just released the algorithm that will kind of replace those metrics, those parametrics per process.
Rachel: Gotcha. So kind of extrapolating what it knows about metrics in general to your workloads based off of, I’m sure, a whole bunch of variables, because that’s how machine learning works. But that sounds very interesting. And this is something that you would run internally, you feed it your own power usage, or your own compute usage, and then it comes back to you with some kind of power usage estimate. Is that right?
Niki: Exactly, and emphasis on estimate because there’s a big difference between carbon metrics for accounting, like accounting level accurate carbon metrics and carbon metrics that are useful for engineers. And so there’s a distinction between top down carbon metrics, which would be useful for reporting the carbon footprints of your entire cloud utilization versus the bottom up carbon metrics that would be useful for engineers, for software engineers to measure and optimize the workload and the region where they run their software based on the carbon intensity of that region, et cetera.
Rachel: Okay. And does Kepler actually help that engineering level decision? Is that what that project is primarily targeted towards?
Niki: Exactly. Yeah.
Rachel: Gotcha. Okay. And so if you’re an engineer and you care about trying to optimize your workloads for least carbon intensity, you would want to use Kepler to try to basically — is this moving, distributing workloads, moving workloads, like what would an engineer actually do in these cases to try to optimize things?
Niki: So an engineer, it depends. So there are so many different personas that would be involved in these decisions. As a platform engineer who would like the whole view of a name space or a type of application, like a type of pod, or per cluster basis, they would use Kepler to monitor these resources. The person who is developing software might want to use Kepler or have energy metrics for their workload to improve, for example, per release cycle or do benchmark testing to improve their software and per feature, for example, etc.
Rachel: Very cool. And so when you were on stage, you talked about an organization called the SCI. And could you dive into that one a little bit more? Because that was something I was curious about. First of all, can you remind us all what SCI stands for?
Niki: Yeah, so the SCI is the Software Carbon Intensity Index. It is developed by the Green Software Foundation, which is part of the Linux Foundation. So it’s like a sister organization of the CNCF and we work together often. And so the SCI, which is soon to be a ISO standard, they’re very close. I think they were approved and were pending the final stages of — creating an ISO standard is very complex. And so the SCI is helping to take all these different factors and standardize them in a way so that we can plug in these different factors like energy, which is multiplied by the carbon emissions factor. You add the embodied carbon. So that’s the hardware itself that you can estimate based on the machine type. And then you calculate this per functional unit. So you have to define a softer boundary for a web page. That would be like the SCI or the score per views or per page views, for example.
Rachel: But it would be all of those services that are components of that web page too. So like, that’s a pretty ambitious goal.
Niki: Yes. So one, and this to circle back to what the TAG is doing, we created the green reviews working group so that we have a reference implementation of how to calculate the SCI using cloud native tooling in the cluster. And we’re using infrastructure donated by Equinix. And that way we can show folks how to do this. And there’s also more documentation by the Green Software Foundation. But we’re trying to implement this in the open.
Rachel: Gotcha. So we have benchmarks created or like formulas and benchmarks created by the SCI that we’re trying to make into an ISO standard that is now being translated into working process by you all using tools like Kepler. Is that kind of how it all ties together? Did I get that all right?
Niki: Exactly. So we’re creating just a regular cluster. We’re deploying Prometheus because Kepler exports to Prometheus. And then we’re hoping to visualize this and we’re hoping to do different benchmark tests to calculate the carbon footprint of CNCF projects. And, you know, with the goal in mind of making this calculation happen on a pair of release cadence or part of the graduation process. And so that’s what we’re doing.
Rachel: Gotcha. You used a phrase in there that I liked, but I already forgot. It was about like the carbon that was in the hardware. What was that phrase again? Something with an E.
Niki: Embedded. Yeah.
Rachel: Embedded. Okay, so I wanted to talk a little bit more about that concept because it’s also something that I see come up a lot, especially, I feel like you get into every kind of… I feel like all of the clouds are making their own chips and hardware and things. And then we get into the, we are very sustainable because you can run your compute more efficiently, um, on, on our machines. And it feels like that’s only a small component of what efficiency actually looks like. Um, but I would just love you to help me kind of understand like various components of efficiency and like where, where the actual, like what are the biggest drivers, I guess is my question.
Niki: This is a great question because with carbon monitoring, the issue is we have scope one, scope two and scope three emissions. Scope one are direct emissions, scope two are indirect emissions, and scope three are everything else in the supply chain in the manufacturing process. And we know that in the cloud, it’s estimated that 70 to 90% of the carbon emissions are scope three. So they are from the supply chain of software from the manufacturing. And this is data that we as cloud users are lacking to a certain extent. So yes, runtime energy consumption is a small part of the equation.
Rachel: Okay. Interesting. Oh, that’s a very good fact. Okay. Thank you very much. The other thing that I was really interested in is you kind of talked about the engineering version versus the accounting version of thinking about carbon. And I’d love to, like, I have a background in finance and so my head always goes there. But a lot of the things that you said on stage at one of the sessions was talking about kind of all of the overlay and, kind of mutual shared goals, things like rightsizing our workloads and FinOps and kind of having all of the, like, carbon impact all working together in the same direction. And I think that that’s a really interesting concept, because I think sometimes you can make a cost justification a lot easier than you can make a save the world justification as sad as that is. So I would love for you to maybe talk about how all of these forces work together and maybe just how you see the interplay between these forces.
Niki: Absolutely. I would say cost savings is an incentive for a lot of cloud users. I saw an article that was published by the CTO of Grafana that looked at Gartner that said that cloud end users worldwide spending is going to is estimated to be $600 billion this year. So we are seeing that with growth, there is more emphasis on cost, right? It is an entry point for folks to go into energy and carbon monitoring discussions mainly because reducing cost may lead to reduction in carbon emitted and energy used. This is not always the case and sometimes it’s sometimes there’s an inverse relation. But we can say that, for example, auto scaling and right sizing nodes, for example, to use less resources or to achieve a higher utilization could help in reducing the carbon emitted from cloud utilization.
Rachel: Very cool. Do any of the FinOps tools that you have seen incorporate any of the carbon metrics that you’re working on, or is that still something that would happen in the future? Like right now, you still have to track them both separately?
Niki: I know that on AWS, for example, the Cloud Customer carbon footprint tool is part of the billing API. So the cloud service providers are aggregating the carbon dashboards with the billing dashboards.
Rachel: Gotcha. So starting to have some kind of unified view. But as you said, just because you are controlling cost does not necessarily mean you’re controlling carbon. And so last I wanted to mention is you had a lovely phrase that you used on stage and you were talking about stubborn optimism. And I love that because it’s like we need stubborn optimism to fight climate change. And you also talked about how that is just like a core character of being a good engineer is you have to be stubbornly optimistic on what you’re working on. So just to close us out, could you just kind of give us some of your, like your case for stubborn optimism and for the role that engineers play in kind of this entire movement and kind of just where your vision for where this could all go.
Niki: So the reason for stubborn optimism is we need hope. A lot of really smart folks in the industry and beyond may be, you know, resisting action, like resisting taking more responsibility in the climate crisis because of how scary and daunting and new and far away it seems and we might feel like we don’t have the tools at our disposal to make an impact. Stubborn optimism takes this, you know, the gravity. We recognize the gravity of the situation, of the climate crisis, and yet we try to remain resilient in the face of it. So stubborn optimism is about unrelenting hope, and even hope to an absurd amount, to an absurd degree. So that’s going beyond the techno-optimism, the regular techno-optimist view that we see in adding stubborn optimism because we know that we might not be able to save everything, but we are stubborn enough to try.
Rachel: I love it. I think that’s such a great way to think about this entire problem space, and I really appreciate your time today to walk us through what you’re working on, and if people wanted to get involved in this, what is the best place for them to start?
Niki: So everyone is welcome to join the TAG channel on the CNCF Slack. We have meetings, we have two working groups and the main tag as well. And each of these have meetings on Wednesdays. And there are various open issues for activities. So we want to do a survey for KubeCon Europe. We have a landscape so folks can read through the landscape and find new tools and different patterns and different areas that they can contribute to.
Rachel: Perfect. And if people want to follow up with you at all, how should they get in touch?
Niki: They can find me on the CNCF slack. That would be the best way.
Rachel: Alright, so we’re just sending everyone to CNCF Slack and everyone come get involved. Wonderful. Niki, thank you so much for your time today. I really appreciate it.
Niki: Thank you so much, Rachel. It was a pleasure talking with you.