When we look around the industry at companies executing effectively in product management Cloudflare stands out. It is remorseless in delivering functionality to improve capabilities in areas where it’s already strong, and to address new markets. It continues to carve out a differentiated story for network performance and security while now also building edge compute services for general purpose application development, which is bringing it into ever more direct competition with the cloud hyperscalers.
I caught up with Cloudflare CEO Matthew Prince in London recently to talk about industry trends in areas such as AI, edge computing, and Developer Experience (DX). There was some interesting nuggets – for example, he said GPU shortages are driving multicloud requirements at AI companies. This is because AI becomes a distributed compute and networking problem as topologies become more complex.
I will be writing a couple of posts based on what we discussed, but like pretty much every conversation in tech right now, we started with generative AI, and how it’s changing everyone’s plans. Prince said he had previously been somewhat AI skeptical, because Cloudflare had been using machine learning models to predict threats since the company was founded in 2010.
I would say we were an AI company and people would roll their eyes, and so I learned to roll my eyes at any company that said the same thing.
But it became clear in internal discussions at Cloudflare this year that the new large language models (LLMs) heralded by ChatGPT had indeed upended the status quo.
According to Prince there are 5 main areas where Cloudflare is focusing on AI.
Firstly, Cloudflare has always been an AI company, and a little of that was true ten years ago. A lot more of that is true today. So we’re just using AI to better protect against security threats. For example an automated system at Cloudflare last year found a security threat that no human had identified before. That’s now not an isolated incident but is happening every day. The false positive rate is still high- you need a human in the loop. But this is a game changer for Cloudflare.
The second thing that we’re seeing is that AI company after AI company is concerned about security because of the fact that costs in their business are very, very high. It depends on what generative AI system you’re using but estimates are that costs can be as much as 25 cents per query. So if a spammer submits a whole bunch of queries to generate a million unique email addresses, all of a sudden that’s a $25,000 cost to the startup.
Prince said companies including OpenAI had started using Cloudflare challenge systems for bot management to protect themselves from these costs.
The third area that we’re using AI is in developer experience. For us as a relatively new entrant with a developer platform that has an opinionated but different way of looking at the world, it is daunting for developers to start out with just a blank white screen. And so we’re using AI to give experienced developers help in overcoming the developer cold start problem, but then also in expanding the universe of who can build code on Cloudflare.
Cloudflare therefore recently launched Cursor, a “GitHub Copilot like” AI assistant for developers, starting with AI assisted documentation. At the moment Cursor is an experimental launch but we can expect significant improvements in the near term. As I noted above Cloudflare is very focused when it comes to iterative improvement.
The fourth area – and this was a total surprise to me – we don’t think that Cloudflare is the right place to do model training. Model training needs lots and lots of machines in relatively close proximity to one another. It needs the absolute latest, greatest GPUs. And we’re not the right place to do that. We have lots and lots of machines; we have lots of GPUs, but they are spread far apart. Building models is going to be much more of the domain of the traditional hyperscalers.
However, what I didn’t anticipate is that there’s an absolute scarcity of GPU available, anywhere in the world. As an AI company you’re trying to figure out where you can get decent capacity and ideally figure out where you can get that GPU capacity as inexpensively as possible. So AI companies have these giant training sets and models, and what a bunch of them were doing previously was they were replicating the training set not only across all the different clouds, but across all the different clouds regions, so that if a GPU became available at any given moment or for a low enough price, they could use it. That’s a hugely wasteful task. And so the thing I was surprised about was that the fastest growing users of our object store R2 today are actually AI companies that are saying instead of us storing multiple copies of the training set in every different cloud and every different region, let’s store one copy in R2.
And then because Cloudflare doesn’t charge egress fees, it makes it easy then to import those models into whatever cloud you can get GPU capacity on. So while we don’t play a direct role in training, we play a very clear indirect role in training for a lot of these generative AI companies. As long as there’s going to be scarcity around GPUs and so long as the other hyperscale public clouds continue to charge for egress, that’s going to be an interesting opportunity for us.
Arbitrage, moving workloads around to chase the cheapest cloud capacity at any given time was one of the original, bad ideas of multicloud exponents. Cloud just doesn’t work that way, because of data gravity, the egress charges that Prince rails against, but also the fact developers take advantage of higher level services and abstractions on cloud platforms. It’s rare that you just want to take advantage of raw compute, and multicloud adds unnecessary complexity. And yet here we are in 2023. For a startup that has free credits from all of the major clouds it makes total sense to do GPU chasing for training models.
And then the fifth reason, which is something that we did anticipate and have been talking about for a few years: The question about where AI inference is done isn’t going to be in a traditional hyperscale data center for lots of reasons, some of which are about compliance, but some of which are performance related. I think it’s going to be a competition between how much inference is done on your end device and how much is done in a network like Cloudflare.
The Cloudflare network is fifty milliseconds from almost everyone on Earth, so we’re seeing increasingly, especially for human computer interaction, that AI companies are building their inference engines on top of us, and over the long term it’s going to be interesting to see whether we’re in competition with, or I would predict much more likely in collaboration with, the endpoint manufacturers. There’s compute on your phone, on your laptop. The GPU or CPU capacity that you have is pretty costly. The bandwidth capacity you have is pretty costly. The storage capacity you have is pretty costly. And so there’s going to be some inference that Apple or Google do on the device, but there’s going to be a lot of it which runs on the network very close to that device. And so I think inference is an area that is relatively nascent right now for us but is going to become much more significant over the long term.
So this about network architecture for AI and machine learning, the boundaries and data flows defining what we’re going to do in the cloud for AI, what we’re going to do on the edge, and what we’re going to do on the device – that’s an interesting architectural opportunity, which every cloud provider will need to think through. Microsoft for example is already doing engineering here – in May it launched Open Neural Network Exchange (ONNX) and the Olive toolchain, a toolset designed to enable developers to optimize machine learning models and inference to make the most of available hardware in heterogenous topologies.
Not surprisingly Prince sees AI as a networking problem – that’s Cloudflare’s home turf. He said some inference is going to make sense on devices because of compliance, privacy and performance reasons. For example – in a self-driving car you want automatic braking to be immediate, whereas latency for aggregating road conditions, traffic reports or weather reports to help choose the best route can be usefully served by an edge cloud doing the aggregation from a number of cloud-based services.
Apparently there is also a sixth area – in terms of possible future developments Prince said Cloudflare could potentially be used to manage information, to make sure that corporate secrets were not fed into, and subsequently leaked by, a model like ChatGPT.
What are those things that you’re okay sending to an AI system, and what are those things that you never want to send to an AI system at all, either for security reasons or because you don’t want to damage the model with incorrect information? LLMs essentially can’t unlearn information.
How to manage data and information flows in AI systems is going to be an important question, and Prince sees a lot of opportunity there. Shadow AI, the new Shadow IT, is a related issue. Whether or not end user organisations mandate that users don’t use third parties like OpenAI, for fear of leaking trade secrets, they’re almost certainly going to do it anyway. We’ve seen this pattern with pretty much every industry advance from minicomputers to PCs to open source to SaaS apps to Cloud. Users will choose whatever is easiest and makes them effective. So just because your company tells you not to use ChatGPT, doesn’t mean you won’t. Trust, provenance, safety, explainability, intellectual property management, data and information sovereignty – will all be differentiators for AI and ML.
One way or another, Prince sees plenty of potential to become the classic provider of picks and shovels, an essential infrastructure provider to AI companies.
disclosure: Cloudflare and Microsoft are both clients.
Mary Branscombe says:
July 17, 2023 at 3:02 pm
just noting that ONNX dates back to 2017 and while Microsoft has always been involved, it’s an open source cross-industry project with Facebook and (later) AWS and others https://thenewstack.io/facebook-microsoft-bring-interoperable-models-machine-learning-toolkits/
ONNX has always done accelerator-specific model optimization, on both server and client; Olive is a nice framework for simplifying that, plugging in extra optimizations for model compression and quantization from eg Intel and AMD and getting a nice package out at the end. ONNX started as a way to take advantage of the acceleration of wherever your model was going to end up so you weren’t limited to a single platform; these days you might be thinking more about choosing where to do the inferencing for cost or efficiency…
Shashwat Gupta says:
August 19, 2023 at 4:55 pm
I think Cloudflare is making some very interesting moves in the AI space. They are clearly seeing the potential of generative AI and are positioning themselves to be a leader in this area.
As said, Generative AI models can be used to generate spam, fake news, and other harmful content. Cloudflare’s security solutions can help to protect against these threats.