console.log()

Networking Is the Hydra of Kubernetes

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

At the end of KubeCon 2025, I was struck by what a huge concern networking remains within the Kubernetes ecosystem. It’s the elephant in the server room. It’s the thing vendors and the Cloud Native Computing Foundation (CNCF) have all been desperately trying to abstract away with service meshes, CNI plugins, and enough iptables rules to make a grown SRE cry. Yet the pipes connecting everything together have remained as complex and important as ever.

Between the CNCF’s newly announced Certified Kubernetes Network Engineer (CKNE) certification and the conversations I’m hearing around Ingress NGINX Retirement, it’s clear that Kubernetes networking is having a moment. This Hacker News commenter captured puts is succinctly:

Networking can be complex with Kubernetes, but it’s only as complex as your service architecture.

The thing everyone’s been quietly acknowledging is now out in the open: we never really solved networking. Details on the CKNE are still sparse, but the curriculum will almost certainly span the full spectrum—from foundational networking concepts to multi-cluster mesh patterns and eBPF (extended Berkeley Packet Filter).

When I first learned about the CKNE I experienced “what year is it?” whiplash. As someone who has spoken with many many IT professionals who cut their teeth on Cisco’s CCNA in the 90s, seeing the CNCF create a networking-focused certification in 2025 felt like déjà vu. There’s a cynical part of me that can’t help saying, “Remember all that stuff we told you Kubernetes would abstract away? Yeah, you actually need to know it anyway. Plus YAML.”

But the importance of this certification—and networking expertise generally—isn’t a matter for nostalgia.

Folks like Marino Wijay, Staff Solutions Architect at Kong, have been calling for a Kubernetes-focused networking certification for years, and for good reason. Consider what actually happens when you deploy a “simple” application to Kubernetes: Your pod gets an IP address from a CNI plugin. Services create virtual IPs that may or may not exist depending on your proxy mode. Ingress controllers perform dark magic to route external traffic into your cluster. Network policies decide who can talk to whom (assuming you’ve configured them correctly and your CNI actually enforces them). Add a service mesh and you’ve got another entire networking layer with its own rules, certificates, and failure modes, not to mention the Gateway API spec for service-to-service transactions. It’s like peeling an onion made entirely of network policies, and every layer makes you cry for a different reason.

It’s noteworthy that communities within the cloud-native ecosystem are tackling networking at very different layers of abstraction. With Istio Ambient Mesh, the focus is on eliminating side-car proxies for every workload—moving much of the traffic handling into infrastructure rather than the application pods. Meanwhile, eBPF operates at the kernel level, intercepting and manipulating network packets and services without even touching user-space applications (I recommend my colleague Rachel Stephens’s excellent post on the subject). Together these approaches underscore the fact that networking in Kubernetes isn’t just one problem—it’s a stack of problems. And whichever layer you choose to engage, the real work lies in aligning your service architecture with the abstraction you adopt.

Was the cloud ever going to save us from networking’s woes? Probably not, but Kubernetes at least promised to abstract away those pesky details. Instead, we’ve just moved the complexity up the stack and wrapped it in YAML. The October 20th AWS DNS incident felt like a moment when the cloud-native community was finally willing to grapple with this truth: Kubernetes networking is not just complicated, it’s fractally complicated. The closer you look, the more complexity you discover. Each abstraction reveals another layer of abstractions beneath it.

Practitioners are discovering hidden networking dependencies everywhere they look. Beyond CNI technology, there are service-based technologies, ingress controllers, and gateway API specs to wrangle. CNCF projects like Cilium and Calico tackle networking most directly, but even these come with gotchas—IP address space conflicts, VXLAN tunneling considerations, egress load balancing that might require introducing service meshes like Istio (with its own server to manage) or Linkerd. Many vendors have stepped in to offer solutions to so-called “Kubernetes Networking Horror Stories,” to quote Philip Schmid, Senior Customer Success Architect, Isovalent at Cisco. Unfortunately, many of the Devops, IT, and SRE professionals I spoke with seem to feel that, however well-intentioned and well-marketed, each added solution adds yet another layer of complexity.

Then there’s the cost issue that keeps engineering leaders up at night. Cloud networking costs are deeply volatile and nearly impossible to predict. How much network transfer will happen between your services? Good luck forecasting that. It’s especially frustrating for those who understand networking fundamentals, because bandwidth isn’t actually the scarce resource so many hyperscalers position it as. Those egress charges? That’s where cloud providers print money.

We haven’t solved networking. We’ve just moved it around, added more acronyms, and fostered standards proliferation (xkcd 927, anyone?). But the CNCF’s recognition of this problem through the CKNE certification might actually help. By formally acknowledging that we’ve created something so complex it requires specialized experts just to understand why your microservice can’t reach your database, we’re taking the first step toward sanity. No more pretending that networking is a solved problem. No more acting like Kubernetes magically handles everything. By forcing ourselves to confront the reality that networking is hard, that it requires actual expertise, and that you can’t just abstract your way out of every problem, we might finally start building better systems—or at least be honest about the ones we have.

Disclaimer: Splunk (Cisco) is a RedMonk client.

Header Image: Caeretan Hydria. 520–510 B.C. The J. Paul Getty Museum.