<?xml version="1.0" encoding="UTF-8"?>        <feed xmlns="https://www.w3.org/2005/Atom">
            <title type="text">Latest imported feed items on RedMonk</title>
                        <entry>
                <title><![CDATA[The Unbundling and Bundling of the PaaS Market]]></title>
                <link href="https://redmonk.com/sogrady/2026/06/10/paas-unbundling/" />
                <published>2026-06-10T13:51:08Z</published>
                <content type="html"><![CDATA[<p>“<em>There are only two ways to make money in business: one is to bundle; the<br />
other is unbundle</em>.” &#8211; Jim Barksdale</p>
<p>Two decades ago this August, AWS &#8211; itself four years old at that point &#8211; launched the Elastic Compute Service, or EC2. Between that and the earlier March release of their Simple Storage Service (S3), Infrastructure as a Service (IaaS) &#8211; more commonly referred to as primitives these days &#8211; was born. Within a year it was an emerging force. Within five it was an eight hundred pound gorilla in market, reshaping the industry around it.</p>
<p>While it’s often forgotten now, there was a competing vision for the cloud almost from the start. Heroku was founded in July of 2007. Salesforce introduced Force.com in September of ’07, and Google followed with App Engine in April of ’08. These platforms and others that followed became a new category called Platform as a Service (PaaS). PaaS was positioned as a simpler alternative to IaaS. Instead of having to pick the building materials to host an application, one simply deployed it and left the rest to the platform.</p>
<p>Of these two potential approaches that emerged two decades ago, IaaS won.</p>
<p>There were many reasons for this. For one, IaaS looked much like what enterprises in particular were used to: compute instances were servers, storage was storage and so on. PaaS, by contrast, was somewhat alien. It didn’t look like what came before it. It was, instead, a black box that couldn’t be taken apart and inspected. Developers didn’t necessarily care about that opacity, but their employers did.</p>
<p>PaaS also necessarily involved tradeoffs. Building with the primitives of IaaS, enterprises could construct virtually anything from websites for strip mall petstores to high speed trading platforms. PaaS, by contrast, was only appropriate for a subset of workloads due to constraints either cost, technical or both. Being a general purpose platform in a world of highly specialized applications is a challenge.</p>
<p>The technology industry landscape, therefore, was dominated by primitives for nearly two decades. And from an adoption and spending perspective, it still is.</p>
<p>But the market has also changed in recent years. A few key factors:</p>
<ul>
<li><strong>First</strong>: the number of available primitives multiplied. As such, it traced a parabola of utility. Initially, as the number of available primitives increased, the platforms usefulness increased in direct proportion. More primitives meant more possibilities. At some point in the last decade, however, it hit the down arc of that parabola and started declining in its appeal, with the ever expanding array of services transitioning from toolbox to burden. Six years in, the <a href="https://redmonk.com/sogrady/2020/10/06/developer-experience-gap/">Developer Experience Gap</a> remains an issue.</p>
</li>
<li>
<p><strong>Second</strong>: the PaaS core approach was reconsidered. If PaaS originally failed to achieve real scale in its adoption in part because of its general purpose focus, what if ambitions were narrowed and the platform were to target a particular workload or set of adjacent workloads? A specialized platform for a particular task is much more achievable than a  Jack of All Trades. This realization has triggered a wave of market innovation in less generalized, and more specialized providers.</p>
</li>
<li>
<p><strong>Third</strong>: the rise of coding assistants is dramatically accelerating the number of applications created, and as a consequence, lines of code being deployed. It is at the same time gradually putting more distance between developers and the construction and deployment of the application itself. Not just with code creation &#8211; which will at times involve programming languages the developer has never learned &#8211; but from the underlying infrastructure. In more and more cases, technology selection is being delegated to coding assistants, and left to their own devices those coding assistants have a preference for abstract platforms, platforms that might once have been termed PaaS.</p>
</li>
</ul>
<p>Given these and other factors, the market for abstractions that sit above the original primitives of IaaS has both fragmented and grown. To be clear, IaaS &#8211; or primitives, today &#8211; remains the dominant approach and will for the foreseeable future.</p>
<p>But the more complicated infrastructure becomes, the more appetite there is for simpler alternatives. Especially if the virtual, 24/7 pair programmers ubiquitous today are putting their proverbial fingers on the scale in favor of abstractions because they&#8217;re easier to programmatically manipulate and require less explanatory context (and thus fewer tokens).</p>
<p>All of which explains the great unbundling. In response to developers and enterprises seeking abstractions but frustrated with limitations of general purpose platforms for their workload of choice, a diverse array of different tools emerged to attack different problems via specialized abstractions sitting above primitives. A few rough categories have emerged.</p>
<ul>
<li><strong>AI app builders (&#8220;vibe coding”)</strong>: (e.g. Bolt.new, GitHub Spark, Lovable, Replit Agent , v0). App writes the code, and increasingly determines the hosting and backend too.</li>
<li><strong>General-purpose PaaS (“Heroku-like”)</strong>: (e.g. Cloud Run, Fly.io, Render, Railway). Code written independently, deployed to abstract servers/containers/scaling.</li>
<li><strong>Front End</strong>: (e.g Cloudflare Pages/Workers, GitHub Pages, Netlify, Vercel). Code written independently, abstracted hosting/CDN/edge.</li>
<li><strong>Backend-as-a-service</strong>: (e.g. Convex, Firebase, Supabase). More than just DBaaS primitive: abstracted DB + auth, storage and other pieces.</li>
<li><strong>Internal Developer Platforms</strong>: (e.g. Backstage, Crossplane, Humanitec, etc). Highly specialized, internal developer-focused platform. </li>
</ul>
<p>There are caveats to the above categories, of course. Some products belong in multiple categories. In other cases the lines in between individual categories can be less than distinct, as with the app centric examples. The reverse can be true as well, with some categories having little in common: see IDPs vs BaaS as one example.</p>
<p>But in an effort to understand how, where and how quickly the market for abstractions above underlying primitives is growing, we’ve been tracking over 70 projects &#8211; a number which will undoubtedly grow by the week, if not day. Their breakdown is here:</p>
<p><a href="http://redmonk.com/sogrady/files/2026/06/01_landscape_by_category2.png"><img loading="lazy" decoding="async" src="https://redmonk.com/sogrady/files/2026/06/01_landscape_by_category2-1024x666.png" alt="" width="1024" height="666" class="aligncenter size-large wp-image-6190" /></a></p>
<p>Unsurprisingly, the second largest category, and category with the longest history, is general purpose app PaaS platforms, topped only by &#8211; what else? &#8211; AI. The list makes rough subjective judgements about which are core to the market, somewhat adjacent, possibly dead (watching) and excluded (dead or disqualified for one reason or another). From a macro perspective, however, what the list demonstrates &#8211; along, arguably, with Heroku’s deprecation &#8211; is that the days of a one size fits all abstraction are over for the time being.</p>
<p>The unbundling has inevitably come for PaaS, just as it once did for databases in the first decade following the millennium. What’s equally inevitable &#8211; again, just as it was <a href="https://redmonk.com/sogrady/2021/10/26/general-purpose-database/">with databases</a> &#8211; is that the unbundling will ultimately give way to bundling. Bundling which arguably has already started. A few examples:</p>
<ul>
<li><strong>Convex Chef</strong>: from BaaS to app builder</li>
<li><strong>GitHub Spark</strong>: from front end to app builder</li>
<li><strong>Lovable Cloud</strong>: from app builder to PaaS</li>
<li><strong>Replit Agent</strong>: from PaaS to app builder</li>
<li><strong>Vercel v0</strong>: from front end to app builder</li>
</ul>
<p>This is a market at work. Bundled product limitations are identified, unbundled products tactically attack them. The survivors then turn their eyes towards adjacent unbundled markets in search of growth. Normally this is a process that takes time; it took the database market well over a decade to fragment with the rise of NoSQL only to return to the multi-workload databases in demand today. It’s not clear that timeframe will hold here, though.</p>
<p>Consider the list of collisions above. It took seven plus years for Vercel (née Zeit) to introduce v0. It took Replit a little over 8 years to get to Agent, and GitHub over 15 to get to Spark. But those were companies founded, at the latest, in 2016. The newer market entrants tell a different story. Convex reached out of market in a little less than three years; Lovable took ten months.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/06/05_category_collision_timing.png"><img loading="lazy" decoding="async" src="https://redmonk.com/sogrady/files/2026/06/05_category_collision_timing-1024x605.png" alt="" width="1024" height="605" class="aligncenter size-large wp-image-6191" /></a></p>
<p>Perhaps most importantly, in the aggregate, all of the above transformations took place within a 25 month window.</p>
<p>Between this re-bundling of previously distinct workloads, then, and the pragmatic reality that market will not support near two dozen large platforms let alone the new options arriving by the day, it seems reasonable to expect consolidation in the market for abstractions, and soon. The growth opportunities for the winners, however, should be substantial.</p>
<p><strong>Disclosure</strong>: Cloudflare, GitHub, Google (Cloud Run/Firebase), Heroku (Salesforce) and Render are RedMonk clients. Bolt, Convex, Crossplane, Fly.io, Humanitec, Lovable, Netlify, Railway, Replit, Spotify (Backstage), Supabase and Vercel are not currently customers.</p>
]]></content>
            </entry>
                        <entry>
                <title><![CDATA[What Bun Can Tell Us About AI, Open Source and Anthropic]]></title>
                <link href="https://redmonk.com/sogrady/2026/06/04/bun-two-lessons/" />
                <published>2026-06-04T13:18:05Z</published>
                <content type="html"><![CDATA[<p>In early December last year, Anthropic acquired Oven, the makers of Bun, a small, fast, open source JavaScript runtime. It’s also a package manager, bundler and test runner but it’s had the most success as a fast runtime built on Safari’s JavaScriptCore rather than Chrome’s V8 like Deno and Node.js. Built as a drop-in replacement for Node focused on speed and written in Zig (later to be rewritten in Rust) and first released in 2022, it found a natural audience in AI with companies like Cursor, Lovable, Windsurf and, of course, Anthropic. It also made inroads into speed-focused production systems at companies like Figma, The New York Times and Slack. Infrastructure players like CircleCI and GitHub, meanwhile, both added support late last year.</p>
<p>In addition to being an important specialty runtime for enterprises, then, it was load bearing infrastructure for AI companies large and small.</p>
<p>Load bearing or no, its commercial prospects at the time of the acquisition &#8211; like so much of the open source the industry relies on, unfortunately &#8211; were less than clear. This Q&amp;A from Jarred Sumner’s acquisition <a href="https://bun.com/blog/bun-joins-anthropic">announcement</a> was blunt:</p>
<blockquote>
<p>
  Q: Is the same team still working on Bun full-time?<br />
  A: Yes. And now we get access to the resources of the world’s premier AI Lab instead of a small VC-backed startup making $0 in revenue.
</p>
</blockquote>
<p>On the whole, the acquisition was fairly straightforward for both parties. Bun receives an immediate capital return and a viable, long term path for support, while Anthropic gains direct control of a project strategic to their offerings. By going the inorganic acquisition route, it spent money to save time in a market with plenty of the former but precious little of the latter. Curious about how the project has fared post-acquisition, we’ve evaluated some of its metrics.</p>
<p>The high level takeaway is that the acquisition does not appear to have slowed the project. The below chart drawn from npm is merely a subset of Bun installs, and doesn&#8217;t reflect those installed directly, via Homebrew, binary or otherwise. Even the subset we&#8217;re able to access here tells a clear story however.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/06/wm_eco_chart1_bun_downloads.png"><img loading="lazy" decoding="async" src="https://redmonk.com/sogrady/files/2026/06/wm_eco_chart1_bun_downloads-1024x809.png" alt="" width="1024" height="809" class="aligncenter size-large wp-image-6186" /></a></p>
<p>It is necessary to caveat the above and similar charts by observing that it’s difficult to precisely tease apart Bun’s success from that of projects that leverage it like Claude Code. Still, growing 16X from 445K/month to 7.3M in less than 30 months is impressive for a runtime in a field full of them. And if the runtime growth sounds impressive, the TypeScript type definitions for it are even more impressive &#8211; bun-types (the first party native definition) grew at 53X while its TypeScript wrapper jumped 234X.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/06/wm_eco_chart2_all_downloads.png"><img loading="lazy" decoding="async" src="https://redmonk.com/sogrady/files/2026/06/wm_eco_chart2_all_downloads-1024x809.png" alt="" width="1024" height="809" class="aligncenter size-large wp-image-6187" /></a></p>
<p>Bun is growing, in other words. It may even be growing faster post-acquisition. But the question is: how sustainable is that growth? To answer it, it’s necessary to look under the hood at how the project is being built, by whom and how that’s changed over time. There are many different conclusions to be drawn from the resulting datasets, but there are two particularly worth highlighting.</p>
<h1>AI and Bun</h1>
<p>As has been discussed elsewhere, the most obvious takeaway in looking at Bun’s commit data is the glaring transition from primarily human to primarily AI contributions. This is certainly no secret; a month ago, Sumner <a href="https://x.com/jarredsumner/status/2054525268296118363">said</a> on Twitter:</p>
<p>“<em>We haven’t been typing code ourselves for many months now. Even pre-acquisition this was pretty much accurate</em>.”</p>
<p>The commits chart confirms this.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/06/wm_chart2_bot_share_pct.png"><img loading="lazy" decoding="async" src="https://redmonk.com/sogrady/files/2026/06/wm_chart2_bot_share_pct-1024x809.png" alt="" width="1024" height="809" class="aligncenter size-large wp-image-6179" /></a></p>
<p>As early as last August, over half of the project commits at a given time were authored by a bot. Post-acquisition, it’s rarely less than that, and has peaked north of 80%.</p>
<p>Here are the total commits per month, AI vs human.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/06/wm_chart1_human_vs_bot.png"><img decoding="async" loading="lazy" src="https://redmonk.com/sogrady/files/2026/06/wm_chart1_human_vs_bot-1024x809.png" alt="" width="1024" height="809" class="aligncenter size-large wp-image-6185" /></a></p>
<p>The trend line here is unambiguous: in approximately 12 months, Bun has transitioned from a project maintained by humans to one primarily authored by machines. To break that out in a little more detail, here are the commits per contributor: AI, but splitting up internal and external contributors.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/06/wm_chart6_three_way_composition.png"><img decoding="async" loading="lazy" src="https://redmonk.com/sogrady/files/2026/06/wm_chart6_three_way_composition-1024x809.png" alt="" width="1024" height="809" class="aligncenter size-large wp-image-6181" /></a></p>
<p>We’ll get to the secondary trend there momentarily, but again, the conclusion is unavoidable. Just as Bun is core AI infrastructure, AI is now the core contributor to Bun.</p>
<p>This raises a host of questions that for the most part can’t be answered yet. How maintainable will the project be over the long term? What if any tech debt and learned helplessness is being accrued by the Bun team by relying so heavily on AI? Will AI continue to increase its percentage of code committed at the expense of humans, or will a natural equilibrium evolve over time?</p>
<p>It’s been barely a year of AI contributions, and half that as employees of one of the most visible and important AI companies on the planet. We won’t and can’t know the answers to these questions for some time, because the sample is insufficient.</p>
<p>But it seems clear that when looking at how core infrastructure products might be impacted by rising AI contributions, Bun will be an important datapoint to monitor.</p>
<h1>Open Source and Bun</h1>
<p>Arguably more interesting than “project includes more and more code written by machines” is what the acquisition means for Bun as an open source project. Bun was and is MIT licensed, and the acquisition announcement made four related promises around the project:</p>
<blockquote>
<ul>
<li>Bun stays open-source &amp; MIT-licensed</li>
<li>Bun continues to be extremely actively maintained</li>
<li>The same team still works on Bun</li>
<li>Bun is still built in public on GitHub</li>
</ul>
</blockquote>
<p>Three of those promises have undeniably been kept. Bun remains open source and MIT licensed. It is actively maintained, and built on GitHub. The team, on the other hand, appears to have gone its separate ways.</p>
<p>First, let’s look at a macro picture of the number of human contributors to the project, total, in the wake of the rising AI contributions.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/06/wm_chart3_unique_humans.png"><img decoding="async" loading="lazy" src="https://redmonk.com/sogrady/files/2026/06/wm_chart3_unique_humans-1024x809.png" alt="" width="1024" height="809" class="aligncenter size-large wp-image-6184" /></a></p>
<p>That number is roughly half of what it was. The number of external contributors, for its part, has dropped off significantly.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/06/wm_chart4_internal_vs_external.png"><img decoding="async" loading="lazy" src="https://redmonk.com/sogrady/files/2026/06/wm_chart4_internal_vs_external-1024x809.png" alt="" width="1024" height="809" class="aligncenter size-large wp-image-6183" /></a></p>
<p>Just as the number of human developers big picture has declined, so too has the number of external developers.</p>
<p>To put that in context, however, while their numbers appeared to be more robust, the actual code contributions from external contributors has always been relatively modest &#8211; even acknowledging the problematic nature of measuring actual contributions by commits.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/06/wm_chart5_internal_vs_external_commits.png"><img decoding="async" loading="lazy" src="https://redmonk.com/sogrady/files/2026/06/wm_chart5_internal_vs_external_commits-1024x809.png" alt="" width="1024" height="809" class="aligncenter size-large wp-image-6182" /></a></p>
<p>After dipping immediately post-acquisition, internal commits have climbed back into the same rough number they occupied prior. External commits, however, have not. Their contributions have significantly declined. This is unsurprising. Contributing to a project maintained by a small, independent startup is a different matter than one maintained by a large, well capitalized AI juggernaut.</p>
<p>Getting back to the original promise of keeping the team together, we can see the above metrics manifested in a detailed list of the original committers.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/06/wm_chart9_core_committers.png"><img decoding="async" loading="lazy" src="https://redmonk.com/sogrady/files/2026/06/wm_chart9_core_committers-1024x809.png" alt="" width="1024" height="809" class="aligncenter size-large wp-image-6180" /></a></p>
<p>Of ~7 identifiable pre-acquisition Oven employees, at least 4 have clearly departed or at least stopped contributing. Another active committer went from 745 pre-acquisition commits to 1 post-acquisition. There are many potential reasons for this mini diaspora, and they may have little if anything to do with the project, AI, the acquisition or any of the above. The motivation, interest &#8211; or permission &#8211; to work on a large open source project can change for a variety of reasons.</p>
<p>But it is certainly not the same team that originally built Bun. Humans left, AI has moved in &#8211; whether that replacement cycle was deliberate and intentional, or not.</p>
<h1>The Net</h1>
<p>The fact that the number of human contributors to Bun is down while the number of machine contributions up would be less interesting if it wasn’t a relatively high profile open source infrastructure project. It’s unclear how Anthropic will navigate its stewardship of the project moving forward, or whether in fact they care about their role as project steward. Bun is an opportunity to ask questions of Anthropic: how does it value open source? What are its intentions for the project?</p>
<p>Consider the case of another open project out of Anthropic, MCP. First launched in November of 2024, consensus within three months was that it was a clear industry standard &#8211; which is an absurd, <a href="https://bsky.app/profile/sogrady.org/post/3lposd6ur4s2k">unprecedented</a> timeframe. It was difficult, even shocking, to be told by competitive vendors that they were effectively granting this status to MCP the January after a November release.</p>
<p>In spite of this early and unprecedented success, it took another ten months for it to be donated to a neutral foundation. For the unfamiliar, this is a necessary step for standardized technologies that will be jointly developed by otherwise fierce competitors. Few if any commercial organizations will contribute to a project solely owned by a third party because it’s tantamount to subsidizing their development with your labor.</p>
<p>To be fair, this timeframe is not totally unreasonable. Kubernetes likewise took thirteen months from initial release to donation. But Kubernetes was also one amongst multiple competitive container orchestration projects, and very far from an obvious industry standard at the time. The delay and ultimate donation, then, was appropriate and strategic. MCP was a much more obvious candidate for standardization, however, earlier even than Docker, the most rapidly adopted technology we’d seen up until MCP. But it still took over a year for an obvious open source standard to be permitted to ascend.</p>
<p>Which begs the question: where is Anthropic, and its counterparts like OpenAI, on the corporate open source maturity curve? Startups understand open source code as consumers because they are built on it. They generally understand contributions, governance and the like because they have to. But as a rule, startups focused on moving as quickly as possible are far less familiar with how open source works on a corporate or enterprise level.</p>
<p>Not that Anthropic stands out in this regard. Microsoft spent decades verbally assaulting open source. Google’s early years were marked by publishing open papers about software without releasing the code behind them. And AWS’ reputation amongst open source communities was arguably worse than Microsoft’s until relatively recently when it learned to more peacefully coexist and contribute back.</p>
<p>These vendors and those that preceded them have had to learn about open source on a macro, rather than micro-scale. About license choices, the role of foundations and how to run open source projects that encourage rather than discourage external contributions &#8211; as well as the benefits of same.</p>
<p>An argument that could be made is that Anthropic won’t have to learn these lessons because it doesn’t need to standardize Bun. Certainly any flow of would be external contributions to the project from competitors is arguably now coming from AI. Why bother amortizing development costs across competitors and giving up control of a project to a foundation when the project’s owner has a bunch of software genies in a bottle that it can release at any time?</p>
<p>That assumes, of course, that the primary value resulting from the standardization of a project is code contributions. Which is a fundamental misunderstanding of the purpose of standardization. External code contributions are not the primary incentive to donate a project to a foundation: preventing needless and unproductive market fragmentation is. As is reassuring potential users. Enterprises, for one, do not embrace software from a foundation because it’s written more quickly. They do so because they prefer their key infrastructure not be controlled by a single vendor.</p>
<p>In any event, those curious about whether and how well Anthropic understands open source would be well served by watching their stewardship of Bun and how it evolves over time. The project, to its credit, is growing apace even as it&#8217;s hosted at a non-neutral single party. But assuming it’s a priority, and that Anthropic has ambitions for Bun to be more than a project that just undergirds Claude Code and its offerings, that growth is likely to be challenged by questions about stewardship and long term project futures. If that growth, meanwhile, is not a priority, and Anthropic has no intentions for the project to be more than just a piece of internal infrastructure, it will have negatively impacted its open source reputation as a poor steward of a popular project. Does Anthropic and their highly capable software creation machine, then, take on the world alone? Or do they trade some control for wider, swifter adoption?</p>
<p>How that question is answered, and whether Anthropic carries forward Bun’s wider mission or abandons its external users and turns inward, will tell us much about how quickly Anthropic is moving along the open source learning curve, and how much it has learned from the companies that have gone before it.</p>
<p><strong>Disclosure</strong>: AWS, CircleCI, Docker, GitHub, Google, Microsoft and Salesforce (Slack) are RedMonk customers. Anthropic, Cursor, Figma, Lovable, OpenAI, The New York Times and Windsurf are not currently customers.</p>
]]></content>
            </entry>
                        <entry>
                <title><![CDATA[The G7 on Open Source vs Open Weights]]></title>
                <link href="https://redmonk.com/sogrady/2026/06/02/g7-open-weights/" />
                <published>2026-06-02T17:32:32Z</published>
                <content type="html"><![CDATA[<p><a title="Daniel Jolivet, CC BY 2.0 &lt;https://creativecommons.org/licenses/by/2.0&gt;, via Wikimedia Commons" href="https://commons.wikimedia.org/wiki/File:Evian-les-Bains_(Haute-Savoie)_(10004827914).jpg"><img decoding="async" width="330" alt="Evian-les-Bains (Haute-Savoie) (10004827914)" src="https://upload.wikimedia.org/wikipedia/commons/thumb/3/37/Evian-les-Bains_%28Haute-Savoie%29_%2810004827914%29.jpg/330px-Evian-les-Bains_%28Haute-Savoie%29_%2810004827914%29.jpg"></a></p>
<p>The term “open source” was <a href="https://opensource.com/article/18/2/coining-term-open-source-software">coined in 1998</a>, at least in part, because the term that preceded it was unclear and required explanation. Free software was descriptive and understood within technical communities familiar with it, but misleading to newcomers who understood free in commercial rather than philosophical terms. It was clear, in other words, that a new descriptor was required, and open source was the result.</p>
<p>Two years after the introduction of a proposed definition for open source AI, adoption has been minimal and there is an increasing awareness that unlike with pure source code, the assets that make up an AI model cannot be reduced to a single, binary open and closed definition.</p>
<p>It’s not that the licenses and promises of open source as they pertain to source code are simple. They can be complex, nuanced and difficult to explain. But they are, at least, that binary. Code is either open source or it’s not. By contrast, it’s now clear that open source AI &#8211; of which code is only a small part &#8211; is going to have to be defined along a spectrum from closed to open.</p>
<p>The <a href="https://en.wikipedia.org/wiki/G7">G7 nations</a> &#8211; with input from parties like <a href="https://opensource.org/blog/open-source-initiative-helps-g7-deliver-vision-on-ai-openness">the OSI</a> &#8211; apparently concur. Their <a href="https://www.entreprises.gouv.fr/files/files/Actualites/2026/g7/vision-AI-openness-opportunities-and-shared-language.pdf">paper</a> published this week, “G7 Vision on AI openness opportunities and shared language,” has several important takeaways. Among them:</p>
<ul>
<li>First, it clearly and unambiguously states that both open source and AI openness have immense societal benefits. It calls the latter, in fact, “<em>an essential contributor to our economies</em>.”</p>
</li>
<li>
<p>Second, it acknowledges the risks and potential future harm that can result from the lack of clear, consistent definitions. “<em>This lack of clarity in the field of AI tends to cast doubt on the degree of openness of such technologies, thereby undermining their benefits</em>.”</p>
</li>
<li>
<p>Third, it <em>explicitly</em> rejects a strict open or closed definition &#8211; “<em>the openness of an AI is not binary</em>.”</p>
</li>
<li>
<p>Fourth, it <em>implicitly</em> rejects existing definitions &#8211; “<em>the meaning of Open-Weight or Open Source AI remains contested</em>.”</p>
</li>
<li>
<p>Lastly, it proposes a four tier system for categorizing AI projects that sit along a spectrum of open.</p>
</li>
</ul>
<p>The proposed classifications are similar in some respects to existing attempts like the Linux Foundation’s <a href="https://docs.google.com/document/d/1RUNrs4flAsYsikXTPu1jWBH1BAumCyeG/edit?pli=1#heading=h.gjdgxs">Model Openness Framework</a>. Both are built for a landscape in which projects will differ in what specifically is made available, the terms its made available under and what restrictions, if any, are placed on use. But where the MOF is quite granular, grading projects around 17 components across an entire development lifecycle, the G7 vision is simpler. It defines four tiers based on five components (weights, deployment code, training code, training data and use restrictions).</p>
<p>In rough terms, those tiers can be described as follows ranging from most open to least:</p>
<ul>
<li><strong>Open Source AI with Open Data</strong>: everything is open and under an OSI license &#8211; code, data, weights, every asset. </li>
<li><strong>Open Source AI</strong>: what’s available is open, but it may or may not include training data, though it must include full training code. </li>
<li><strong>Open Weights AI</strong>: weights and code are available and under an OSI license, but nothing else. </li>
<li><strong>Weights Available AI</strong>: weights and code are available and open for inspection, but are released under a license which cannot be called open source due to use restrictions or other prohibited limitations. </li>
</ul>
<p>It remains to be seen whether or not the industry can adapt to a definition of open that depends on a sliding scale rather a fixed yes/no. But it also doesn’t have a choice. Two years of development and discussion and two years of living with a <a href="https://opensource.org/ai/open-source-ai-definition">proposed definition</a> have gotten us no closer to an industry consensus. Subtly, however, what the G7 nations have done with this document intentionally or unintentionally is to both acknowledge that fact, make it irrelevant and implicitly propose their alternative.</p>
<p>The challenge for any single definition of open source AI is that it is not possible to please both definition purists and definition pragmatists. The former point out that any definition that allows for any omission of training data is effectively granting the term open source to a project that cannot ever be independently replicated. Which is legitimate. The latter, on the other hand, point to issues with datasets ranging from the byzantine nature of data licensing to the sheer impracticality of the size of these datasets. Which are also legitimate. You can please one of these groups about an open source definition, but not both.</p>
<p>What the G7 is proposing serves as a recognition that that debate is a lost cause. Instead, for all intents and purposes, the G7 is proposing to deprecate the term open source AI in favor of open weights.</p>
<p>It is true, on the one hand, that there are not one but two different tiers in the G7’s framework that explicitly cite the term open source. So the deprecation is not literal. There is a definition of open source AI in existence.</p>
<p>But if no major competitive models can satisfy that definition, does the definition matter?</p>
<p>Two weeks ago, we <a href="https://redmonk.com/sogrady/2026/05/15/open-ai-models/">surveyed</a> a large, representative sample of relevant models. Since then we’ve added a few new models to track. A quick survey of the licensing terms for this sample are suggestive. 28 of the surveyed models are closed, and thus irrelevant in a discussion of openness. Of the 40 remaining in our sample, half are Weights Available AI (non-OSI license) and half are Open Weights AI (OSI license). Which in turn means that none are Open Source AI and none are are Open Source AI with Open Data.</p>
<p>To be clear, there are borderline cases: IBM publishes detailed training documentation and methodology, but not the code. Meta has released fine-tuning recipes and scripts (llama-recipes) alongside Llama, but again not the code. Deepseek, meanwhile, arguably went furthest, providing reinforcement learning training code and distillation scripts &#8211; but not the full pipeline. None of these, therefore, are considered open source AI by the G7&#8217;s definition.</p>
<p>Outside of our sample, meanwhile, there are models that provide data, code and weights: AI2&#8217;s OLMo and EleutherAI&#8217;s Pythia most notably. But they aren&#8217;t particularly competitive with the selected open and closed models tracked here and thus are not considered.</p>
<p>Put simply, the G7’s proposal at once codifies the open source AI definition while simultaneously making it irrelevant. Open weights becomes the de facto term of art, then, by default &#8211; at least until such time in future that truly open source models become more competitive. Instead of blurring the definition of open source AI to meaninglessness, a new, more descriptive term in open weights seeks to mitigate the shortcomings of its predecessor &#8211; much as open source itself once did for free software.</p>
<p>It is not clear whether even an august body like the G7 can compel adoption of their proposed framework, and there are questions about who should be defining industry terminology: governments, or industry bodies? But if this effort is successful, and the term open source is effectively relegated to just describing source code, that could be the <a href="https://redmonk.com/sogrady/2024/10/22/from-open-source-to-ai/">best possible outcome</a>. Vendors who wish to benefit from the halo of open can do so with a term separate and distinct from open source, and thus one insulated from contamination from use restrictions, lack of training data and other issues that would violate the original spirit and intent of the open source definition.</p>
<p>In past debates between open source purists and pragmatists, the latter would frequently argue that a definition of open source that was too strict would never be used.</p>
<p>The mistake all along may have been assuming that that was a bad thing.</p>
<p><strong>Disclosure</strong>: neither the G7 nor the OSI are RedMonk clients.</p>
]]></content>
            </entry>
                        <entry>
                <title><![CDATA[Why Hardened Images are Suddenly Everywhere]]></title>
                <link href="https://redmonk.com/kholterhoff/2026/06/01/why-hardened-images-are-suddenly-everywhere/" />
                <published>2026-06-01T15:55:15Z</published>
                <content type="html"><![CDATA[<p><img decoding="async" class="alignnone size-full wp-image-476" src="https://redmonk.com/kholterhoff/files/2026/06/diamond.gif" alt="" width="100%" height="240" /></p>
<p>Over the past 12 months, vendors across cloud-native infrastructure have converged on the same conclusion: the market wants image hardening. Docker went GA with Docker Hardened Images <a href="https://www.docker.com/press-release/announces-hardened-images-catalog-to-strengthen-enterprise-software-supply-chain-security/">last May</a>, then<a href="https://www.docker.com/press-release/docker-makes-hardened-images-free-open-and-transparent-for-everyone/"> made the whole catalog free and open source in December 2025</a>. In July, Broadcom&#8217;s Tanzu division<a href="https://news.broadcom.com/app-dev/broadcom-introduces-bitnami-secure-images-for-production-ready-containerized-applications"> launched Bitnami Secure Images</a>. Chainguard built its<a href="https://www.chainguard.dev/unchained/everything-we-announced-at-chainguard-assemble-2026"> Assemble 2026 conference</a> in March around the same theme. Google Cloud Next 2026 put Wiz, now part of Google, front and center with WizOS leading the security story with its <a href="https://www.wiz.io/blog/introducing-wizos-hardened-near-zero-cve-base-images">hardened container base images</a>. On May 12, Red Hat used its Summit keynote slot to GA<a href="https://www.redhat.com/en/about/press-releases/red-hat-hardened-images-accelerates-cloud-native-development-and-zero-cve-strategies"> Red Hat Hardened Images</a>. A week later at Open Source Summit North America,<a href="https://edera.dev/stories/edera-and-minimus-partner-to-deliver-end-to-end-container-security-for-critical-infrastructure"> Edera and Minimus announced a partnership</a> bundling hardened images with hypervisor-level workload isolation.</p>
<p>So, what’s going on here? If you ask my colleague James Governor he would likely point you to his LinkedIn post from last year where he wrote “<a href="https://www.linkedin.com/posts/jamesgovernor_talking-to-chainguard-dear-lord-theyve-share-7298397560816238593-HCYU/?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAAABr9QBdHS_yhvAWPb_5xy6L_IJyjICTr4">talking to Chainguard. dear lord they&#8217;ve apparently established a license to print money</a>”—a sentiment he then expanded into a <a href="https://redmonk.com/jgovernor/chainguard-builds-a-market-everyone-else-wants-in/">blog post</a>. But in addition to the possibly obvious financial motive, it is worth exploring this exploding market for hardened images in light of rapidly improving LLMs. Container base images create a centralized supply chain risk surface, yet the inherited vulnerabilities they introduce are often pushed downstream to developers who have little practical ability to remediate them—thus, the need for subscription-based solutions.</p>
<p>So let’s talk about image hardening in 2026. Why hardened images blew up, why images don’t come pre-hardened, and why developers are paying attention.</p>
<p>&nbsp;</p>
<h2>What does “hardened” even mean?</h2>
<p class="font-claude-response-body break-words whitespace-normal leading-[1.7]">When practitioners say &#8220;hardened image&#8221; in 2026, they usually mean a number of properties mapping closely to what the major security-standards bodies have been recommending for years. The foundation is a minimal base, often distroless, so there&#8217;s no shell, no package manager, and little else for an attacker to live off. This is the oldest and least controversial piece: NIST&#8217;s container security guidance, SP 800-190 recommends base layers be <a class="underline underline underline-offset-2 decoration-1 decoration-current/40 hover:decoration-current focus:decoration-current" href="https://nvlpubs.nist.gov/nistpubs/specialpublications/nist.sp.800-190.pdf">minimalistic distributions with the OS kernel omitted</a>, noting that a stripped-down image yields a much smaller attack surface with fewer opportunities to attack and compromise it. On top of that base sit a handful of runtime hardening defaults: a non-root user, and a read-only root filesystem where the workload allows. Both trace back to the same NIST guidance, which calls for least-privilege execution and describes the read-only filesystem design as a way to keep an attacker from persisting data inside the image.</p>
<p class="font-claude-response-body break-words whitespace-normal leading-[1.7]">The remaining pieces are about provenance and upkeep. A hardened image is expected to ship with an SBOM enumerating its contents—what CISA, building on the original NTIA work, defines as a <a class="underline underline underline-offset-2 decoration-1 decoration-current/40 hover:decoration-current focus:decoration-current" href="https://www.ntia.gov/report/2021/minimum-elements-software-bill-materials-sbom">formal record of the components and supply-chain relationships used in building software</a>—and with some form of cryptographic provenance, typically signed attestations following the OpenSSF&#8217;s SLSA framework, which specifies <a class="underline underline underline-offset-2 decoration-1 decoration-current/40 hover:decoration-current focus:decoration-current" href="https://slsa.dev/spec/v1.0/levels">cryptographically signed provenance generated by the build system to detect tampering</a> (in practice, often via <a class="underline underline underline-offset-2 decoration-1 decoration-current/40 hover:decoration-current focus:decoration-current" href="https://github.com/sigstore/cosign">Sigstore Cosign</a>). Finally, there&#8217;s a maintenance promise in the form of a service-level commitment that whoever ships the image will rebuild and republish it when an upstream CVE lands. This last element is the newest and the least standardized; it&#8217;s where commercial &#8220;secured image&#8221; offerings go beyond the baseline, with vendors like Wiz describing images that are <a class="underline underline underline-offset-2 decoration-1 decoration-current/40 hover:decoration-current focus:decoration-current" href="https://www.wiz.io/academy/container-security/how-to-get-started-with-hardened-images">continuously maintained at near-zero CVEs under an SLA</a>.</p>
<p>&nbsp;</p>
<h2>Why now?</h2>
<p>Three primary pressures converged to make 2026 the year of image hardening.</p>
<p>The first is that the National Vulnerability Database stopped being a reliable single source of truth. CVE submissions grew 263% between 2020 and 2025, with Q1 2026 running another third higher year-over-year. NIST enriched almost 42,000 CVEs in 2025 (a 45% jump over its previous record) and still could not keep up. On April 15 they formally <a href="https://www.nist.gov/news-events/news/2026/04/nist-updates-nvd-operations-address-record-cve-growth">moved to a prioritized model</a>. Only CVEs on CISA’s KEV catalog, in federal software, or in executive order 14028 critical software get full Common Vulnerability Scoring System (CVSS) scoring and CPE mapping. Everything else gets a “Not Scheduled” label. VulnCheck’s analysis puts ”<a href="https://thehackernews.com/2026/04/nist-limits-cve-enrichment-after-263.html">approximately 10,000 vulnerabilities from 2025 without a CVSS score</a>.” If your scanner used to tell you what to prioritize based on the National Vulnerability Database (NVD), that pipeline has a hole in it now.</p>
<p>The second pressure is that the open source registries are openly under siege. The Shai-Hulud worm started chewing through npm packages in late 2025, came back as a more aggressive 2.0 in November, and in April 2026 a variant called Mini Shai-Hulud started compromising packages with valid OIDC-signed provenance, slipping past 2FA entirely. By May 11 it had hit TanStack (12 million weekly downloads on React Router alone), Mistral AI, UiPath, and OpenSearch. OpenAI <a href="https://openai.com/index/our-response-to-the-tanstack-npm-supply-chain-attack/">disclosed credential theft from two employee workstations</a>. TeamPCP, the group behind much of it, <a href="https://cybernews.com/security/shai-hulud-supply-chain-attack-competition/">open-sourced the worm and announced a $1,000 contest</a> for the largest supply-chain attack built on top of it. The attack surface is no longer just your package.json. It is every package any base image pulls in by default.</p>
<p>The third pressure is that AI is doing vulnerability discovery at machine speed. Anthropic released Claude Mythos to <a href="https://www.anthropic.com/research/glasswing-initial-update">vetted security researchers</a> as part of its Project Glasswing. OpenAI did <a href="https://openai.com/index/scaling-trusted-access-for-cyber-defense/">the same</a> with GPT-5.4-Cyber. The Edera-Minimus announcement was explicit about why the timing mattered, referencing Mythos by name. Whatever you think of threat-intel framing, the empirical claim is straightforward: novel vulns are being found and weaponized faster than human-scale CVE triage can absorb them.</p>
<p>So where do hardened images come into this? While image hardening does not solve any of these problems individually, it shrinks the surface that all three of them attack.</p>
<p>&nbsp;</p>
<h2>How we got here</h2>
<p>Container images started out as the worst version of themselves, on purpose. Early Docker tutorials taught developers to build containers on top of a full Linux distribution like Ubuntu or Debian. You got a familiar filesystem, a shell, a package manager, and the ability to debug a running container with the same tools you use on your laptop. That convenience came with everything else the distribution shipped, useful or not, and base images stayed bloated for years.</p>
<p>Google&#8217;s<a href="https://github.com/GoogleContainerTools/distroless"> distroless project</a> was the first serious push in the other direction, starting around 2017. The idea was stark: ship your app and its runtime dependencies and absolutely nothing else. No apt. No sh. Bazel built it. The smallest distroless image is about 2 MiB, roughly half the size of Alpine and less than 2% the size of Debian. Adoption has been significant. The <a href="https://github.com/kubernetes/enhancements/blob/master/keps/sig-release/1729-rebase-images-to-distroless/README.md">Kubernetes</a> ecosystem treated distroless as best practice, and it has been “<a href="https://github.com/GoogleContainerTools/distroless#why-should-i-use-distroless-images">employed by Google and other tech giants</a>.”</p>
<p>Chainguard, founded in 2021 by several ex-Google engineers (including the people who built Sigstore), came at it from a different angle. They built <a href="https://www.chainguard.dev/unchained/introducing-wolfi-the-first-linux-un-distro-designed-for-securing-the-software-supply-chain">Wolfi</a>, a Linux “(un)distro” designed specifically to be the building block for hardened images rather than a general-purpose OS. Apk-based, no kernel, packages compiled with hardening flags, SBOMs at build time, Sigstore signatures on everything. <a href="https://www.prnewswire.com/news-releases/chainguard-surpasses-500-million-container-build-manifests-302679701.html">By February 2026</a> they had passed 500 million unique container build manifests and a catalog of more than 2,000 projects, on roughly<a href="https://www.prnewswire.com/news-releases/chainguard-raises-356-million-in-series-d-funding-to-be-the-safe-source-for-all-open-source-302435220.html"> $40 million</a> in 2025 ARR.</p>
<p>There have been several other approaches to the problem that hardened images address, each tackling a different layer of the stack. Notably, AWS&#8217;s<a href="https://aws.amazon.com/bottlerocket/"> Bottlerocket</a> is a hardened <i>host</i> operating system rather than a hardened <i>application</i> image. It minimizes the attack surface of the underlying node by shipping only the components needed to run containers, but it leaves the security of the application images themselves up to the user. A closer analogue is Replicated&#8217;s<a href="https://www.replicated.com/blog/introducing-securebuild"> SecureBuild</a>, which targets the application layer directly by providing zero-CVE container images for open source software. Rather than repackaging upstream binaries, SecureBuild rebuilds everything from source using an ephemeral build system and distributes the results through a hardened registry. Notably, Replicated also shares a majority of image subscription revenue with the open source maintainers whose projects it secures. Taken together, these efforts illustrate that &#8220;hardening&#8221; can happen at multiple points, from the host OS down to the individual application image, and that the most effective strategy often depends on which part of the supply chain poses the greatest risk.</p>
<p>&nbsp;</p>
<h2>What developers actually care about</h2>
<p>A base image is a shared dependency, which means it&#8217;s also a shared liability. Whatever vulnerabilities ship inside an image are inherited by every application built on top, and they land on developers who are rarely in a position to fix them. You can&#8217;t patch a flaw in a system library you didn&#8217;t choose and often don&#8217;t control. The gap between who inherits the risk and who has the will and ability to actually remediate it is the whole reason a market for subscription-based hardened images exists.</p>
<p>The mechanism behind that liability connects directly to the flood of CVEs that AI-assisted vulnerability discovery has unleashed. Every package inside an image is a line in its SBOM and, potentially, a vulnerability waiting to be surfaced. A container built on a full Ubuntu base carries ~100 packages before your code is even added, and some of them—the shell, the package manager, assorted system libraries—never execute at runtime. Yet each is still a row your scanner checks against the vulnerability databases, and each can light up red when a new CVE lands upstream. <a href="https://www.redhat.com/en/blog/why-distroless-containers-arent-security-solution-you-think-they-are">Distroless skepticism aside</a>, a distroless image cannot inherit a software vulnerability that it doesn&#8217;t contain. Stripping the image down was never really about disk size; it was about carrying less code that can turn into a CVE in the first place.</p>
<p>&nbsp;</p>
<h2>Why isn&#8217;t “hardened” already the default?</h2>
<p>As I have been following the image hardening market in vendor briefings and <a href="https://redmonk.com/videos/qt-red-hat-summit-2026/">conference keynotes</a>, the question of why hardening isn&#8217;t the standard was one I struggled with. Here&#8217;s what I learned: container base images grew up as a developer convenience tool, not a security artifact. Installing extra packages from the command line is one of the <a href="https://docs.docker.com/build/concepts/dockerfile/">first things</a> any Docker tutorial teaches—Docker&#8217;s own Dockerfile guide includes <code>apt-get install</code>—and many of the most popular official images ship a full toolchain by default, with <code>-slim</code> and <code>-alpine</code> variants offered precisely because the defaults carry more than most workloads need, and changing them would have broken enough downstream workflows that it was never going to be a routine upstream decision.</p>
<p>There is also an incentive split. The upstream distribution maintainers and the developers using their images are different people with different priorities. Debian wants to ship a usable general-purpose OS. The developer pulling the Debian Bookworm OS into a container wants a runtime for one specific application. Hardening for the second use case requires throwing out things the first use case depends on. Vendors stepped in because upstream maintainers generally lack strong enough incentives to harden on the downstream user&#8217;s behalf. Docker, Chainguard, Minimus, Google/Wiz, and Red Hat all appearing in roughly the same window is the industry&#8217;s attempt to make hardened the new default without waiting for upstream to change.</p>
<p>&nbsp;</p>
<h2>Conclusion</h2>
<p>To sum up, the reason behind 2026’s influx of hardened images is the increasing cost of extraneous code: transitive dependencies, unused packages, etc. As automated tooling drives up the rate of vulnerabilities, both <a href="https://redmonk.com/kholterhoff/2026/05/05/ai-slop-vulnerability-treadmill/">publicly disclosed and black market exploits</a>, the triage pipeline is necessarily straining under the volume, so that it has ceased to be a tolerable annoyance for developers, and has graduated into a serious liability. With increasingly malicious supply chain worms targeting the build pipeline directly, with AI accelerating both sides of the discovery-exploitation cycle, the price of shipping a whole distribution&#8217;s worth of software inside every container went up. Hardened images are the market finally pricing that in.</p>
<p><b>Disclaimer:</b> Red Hat, Bitnami by VMware/Broadcom, Google, AWS, and Chainguard are all RedMonk clients.</p>
]]></content>
            </entry>
                        <entry>
                <title><![CDATA[Fastly Xcelerate LDN is kind of a vibe – why you should come]]></title>
                <link href="https://redmonk.com/jgovernor/fastly-xcelerate-ldn-is-kind-of-a-vibe-why-you-should-come/" />
                <published>2026-05-29T17:30:49Z</published>
                <content type="html"><![CDATA[<p>I have known Simon Wistow a long time, and always enjoy hanging out with him. He&#8217;s a web application and performance OG, one of the co-founders of Fastly. The company runs a solid conference called Xcelerate in London every year &#8211; which feels kind of like an indie conference. I always meet engineers building cool things, and the content is technical, rather than marketing led.</p>
<p>This is a video I recorded with Simon about the show this year.  Xcelerate London 2026 is on June 10th this year &#8211; get your tickets <a href="https://learn.fastly.com/xcelerateldn">here</a>.</p>
</p>
<p>&nbsp;</p>
<p>disclosure statement- Fastly sponsored the video.</p>
]]></content>
            </entry>
                        <entry>
                <title><![CDATA[What the OSS Summit Says About OSS in 2026]]></title>
                <link href="https://redmonk.com/sogrady/2026/05/19/oss-summit-2026/" />
                <published>2026-05-19T18:54:03Z</published>
                <content type="html"><![CDATA[<p><a href="http://redmonk.com/sogrady/files/2026/05/IMG_5660_Original-scaled.jpeg"><img loading="lazy" decoding="async" src="https://redmonk.com/sogrady/files/2026/05/IMG_5660_Original-1024x768.jpeg" alt="" width="1024" height="768" class="aligncenter size-large wp-image-6175" /></a></p>
<p>In the wake of O’Reilly’s decision to exit their events business, including OSCON, a void was created. Among its other functions, OSCON served as the de facto annual gathering of forces within open source. While it’s distinct in some critical ways and can’t necessarily replicate the traction of its spiritual ancestor (in part because of OSCON’s densely packed venue), the Linux Foundation’s (LF) OSS Summit is arguably the best approximation of OSCON that exists in 2026. It transcends product categories, corporate boundaries and seniority levels to attract a mixed audience of young, old and everything in between.</p>
<p>It also, as mentioned, serves as a nexus for various powers within open source to meet &#8211; often accidentally &#8211; and exchange notes. It is, in the words of several open source people this week, a “favorite event.”</p>
<p>It’s also, by virtue of its attendees and focus, a valuable vantage point for observing macro trends and issues across open source at scale. Here are five takeaways from this year’s event.</p>
<h1>AI and Data</h1>
<p>When the OSI and other parties attempted to determine how and whether the term open source should be applied to AI models, data inevitably was the sticking point. The relationship of open source licenses to the source code components of the models was well understood. With data, not so much. Data licensing, unfortunately, is fractally more complex than for mere code.</p>
<p>It was not surprising, therefore, to see data singled out as one of the last holdouts to an open AI landscape. The LF has targeted this as an area of research and investment, with its CDLA family of licenses as one example.</p>
<p>There is, however, no consensus around data licenses, or even which entity should be the arbiter of same. The LF is appropriately focused on this as an area of necessary attention and investment, but how data licensing does or does not progress will certainly not be up to them alone.</p>
<h1>Open Models</h1>
<p>Research from the LF has apparently reached a similar conclusion to RedMonk’s <a href="https://redmonk.com/sogrady/2026/05/15/open-ai-models/">own analysis</a>: specifically that open models not only continue to compete with their closed, frontier counterparts, but that the gap between the two is closing over time.</p>
<p>This is interesting in the abstract, because having open alternatives to closed products has generally been beneficial to users. But it is of particular interest because of the stakes involved. Building and advancing frontier models, to date, has been fantastically expensive, and pushed startups in the space to pursue private capital investments in amounts previously unheard of. The return on these investments is predicated on several expectations, among them that the private models will become so indispensable that not paying the cost &#8211; even as costs rise &#8211; is unthinkable.</p>
<p>Open models that are becoming more aggressively more capable at faster and faster rates introduce questions around these valuations, and the expectations of return. It will be interesting to monitor the tension between open and closed models in the year ahead, because it’s possible there’s a threshold of capability at which users individual and enterprise alike regard as “good enough,” and that that threshold may be met by open models soon.</p>
<h1>Security</h1>
<p>Casting a pall over the success of open source more broadly were questions of security. As Jim Zemlin’s keynote quoted, the bill for deferred security investments for the industry as a whole is coming due. And we are not collectively prepared to pay it.</p>
<p>AI is both sides of the blade here. Via Project Glasswing, enabled by early access to Anthropic’s most capable model, security researchers are attempting to stay one step ahead and identify and patch vulnerabilities faster than they can be exploited.</p>
<p>But that is not scaling across the industry. AI is being used and used well by attackers, who are able to dial back the cost of the creating exploits to near zero and &#8211; coupled with decades of social engineering expertise &#8211; to attack broadly, at scale and with velocity.</p>
<p>This has led to fundamentally misguided efforts like that of the NHS to <a href="https://www.theregister.com/software/2026/05/05/nhs-to-close-source-github-repos-over-ai-security-concerns/5224392">close source</a> hundreds of open repositories in an effort to protect them. Notwithstanding the fact that this type of action both doesn’t work and has no defensible academic foundation underneath it, it is inevitable that we’ll see more of it.</p>
<p>Open source is likely, in other words, to have to prove its security bona fides all over again.</p>
<h1>Maintainer Burnout</h1>
<p>One popular topic of conversation at this event was maintainer burnout. From user entitlement to security worries to infrastructure not built for the volume of inbound AI contributions, life for project maintainers has never been more challenging. Asked if AI was helping to mitigate that, one maintainer bluntly answered, “No.”</p>
<p>Maybe it will in time, or perhaps other process and infrastructure adjustments will ultimately result in improvements, but for now maintainers are faced with an increasing number of challenges with no commensurate adjustments in the resources at their disposal. The number of would be contributors has skyrocketed in many cases; the number of maintainers has not.</p>
<p>This isn’t an LF problem, or at least it’s not strictly their’s to solve, but it is and remains very much an industry problem. One that does not get nearly the attention it deserves.</p>
<h1>Who is the Next Generation of Open Source Defenders?</h1>
<p>For decades, open source has found for itself new generations of advocates and defenders. Drawn to it by different paths, whether that was personal benefit, commercial opportunities or  garden variety idealism, generations of technologists metaphorically handed off the responsibility for actively protecting open source to those coming up behind them who shared their sentiment.</p>
<p>It’s not clear, however, how much longer that shared responsibility can be sustained.</p>
<p>Open source has become, to a degree, a victim of its own success. Its ascension and then dominance made it something to <a href="https://redmonk.com/sogrady/2018/05/11/taking-open-source-for-granted/">take for granted</a>, not something that needed to be cared for, nurtured and actively defended. Many developers today cannot remember a world not only in which open source didn’t exist, but one in which it wasn’t the dominant approach to building software. As a result, things like the ardent and assiduous defense of the literal definition of the term open source itself seems quaint at best and pedantic at worst. ”Ok, boomer,” is one common response.</p>
<p>As those who have been around long enough to understand that, like democracy, open source needs to be guarded with vigilance age out and retire, the question is who will step up to take their place? The OSS Summit didn’t provide many answers in that regard, but if would be defenders are out there, it’s presumably where they will first appear.</p>
<p>And if they don’t, maybe the event will have to recruit them more actively.</p>
]]></content>
            </entry>
                        <entry>
                <title><![CDATA[Google Cloud Next 2026: The Agent Era and the Full AI Stack]]></title>
                <link href="https://redmonk.com/jgovernor/google-cloud-next-2026-the-agent-era-and-the-full-ai-stack/" />
                <published>2026-05-19T13:54:55Z</published>
                <content type="html"><![CDATA[<p><a href="http://redmonk.com/jgovernor/files/2026/05/agentic-marathon.jpg"><img loading="lazy" decoding="async" class="aligncenter wp-image-5396 " src="https://redmonk.com/jgovernor/files/2026/05/agentic-marathon-1024x768.jpg" alt="view of the Developer Keynote at Google Cloud Next 2026. Spotlights are flicking across the room and the text on the slide reads simply: &quot;We built an agentic marathon simulator&quot;." width="749" height="562" /></a></p>
<p>I am old enough to remember when people questioned whether Google Cloud even had a future within Alphabet. Well now we have a pretty clear answer &#8211; yes. I originally began this post as a quick roundup of news from Google Next 2026, but given Google’s results dropped the following week, let’s look at them first.</p>
<p>In the first quarter, Google Cloud revenue jumped 63%, year over year, to $20 billion, in the parlance of the day &#8211; absolutely crushing it. To the moon, etc. Growth rates like that on an already hyperscale business is quite something. Google Cloud now accounts for 18% of Alphabet’s total revenue. Which is to say, it’s a keeper.</p>
<p>Sundar Pichai, Google CEO said:</p>
<blockquote>
<p>“Google Cloud is differentiated because we are the only provider to offer first-party solutions across the entire enterprise. Our growth in revenue, operating margin and backlog highlights this differentiation.”</p>
</blockquote>
<p>AWS and Azure also had excellent quarters, but Google&#8217;s growth, even though from a lower base, was still a clear marker of substantial progress. I often say the best packager in any tech wave wins and wins big, and Google is looking increasingly well positioned. Packaging and integration go together hand in hand &#8211; you want an offering that makes things easy for the customer, reducing cognitive and organisational overheads. But packaging and economies of scale are also closely entwined &#8211; Google is well placed here because it owns technology up and down the stack.</p>
<p>Google also really pushed its position as a “full stack” AI infrastructure provider- it has technology leadership that spans from custom silicon to productivity apps, and unlike Microsoft and AWS it has its own capable frontier model in the shape of Gemini.</p>
<p>It is agents and harnesses however that are really going to drive model growth.</p>
<h1><strong>The agent platform</strong></h1>
<p>At Cloud Next Google announced tools for agent management and orchestration, and demonstrated continuing strength and depth in data infrastructure. It also apparently reintroduced Knowledge Management as an enterprise concern for the AI era.</p>
<p>The agent story is on point because, frankly, agents are finally capable enough to do some real work (on their own). Developers noticed a step change in coding agents around the turn of the year, as big a leap forward for LLM capabilities as the launch of ChatGPT in November 2022.</p>
<p>You just have to use the tools to see the clear difference in capability, or listen to AI-savvy developers across the industry. We’re in a fundamentally different environment now &#8211; the code being generated now is of a much higher quality.</p>
<p>Devs also began to better understand the value of brute force, and iterative loops (see the “Ralph Wiggum” pattern, invented and popularised by Geoff Huntley) in getting agents to do what they wanted. Harnesses are also becoming more effective, for example in providing consistent developer experiences as models evolve. If agents are getting this good at building and deploying software, other business processes are also going to be affected. The infrastructure we rely on just wasn’t built with agents and models in mind though &#8211; scale, security and management challenges are all massively increasing.</p>
<p>Google introduced Gemini Enterprise Agent Platform, building on and replacing its Vertex AI platform, with new features in its Agent Development Kit, scale and memory improvements for its Agent Runtime, but most importantly from a manageability perspective &#8211; Agent Identity, Agent Registry, and Agent Gateway. The central idea of Agent Identity is that you can assign every AI agent a unique cryptographic identity. Google also announced Agent Simulation, Agent Evaluation, and Agent Observability. Evals are as important to LLM apps as unit tests are to current software development. Observability is of course more important than ever, given we need to understand outputs and behaviours in production.</p>
<p>Google’s Dev Rel team did a solid job explaining how all of these tools play together in a demo during the developer keynote on day two &#8211; with a scenario of planning a marathon in Las Vegas. Each step in the development process was outlined and explained during the keynote.</p>
<p>For example a demo featuring Agent Studio &#8211; a collaborative workspace for building new agents. Key to the demo flow as whole was making it clear that different tools will be used by different personae, including “low code” developers using natural language, people living in markdown for specs, specs that cover business rules, not just software guardrails. Prebuilt agents include code modernisation, financial analysis, and deep research.</p>
<p>We’re going to see a lot more marketing around agent capability from platform providers as agents become more capable and widely deployed &#8211; indeed it’s pretty much the 2026 summer keynote.</p>
<h1><strong>The agent permission, security and identity play</strong></h1>
<p>With agents you have a bunch of autonomous systems using non-deterministic models to try and achieve outcomes. These agents are capable of taking action &#8211; sometimes very bad actions, such as deleting production databases. They absolutely need guardrails, and a markdown file just isn’t going to cut it. That’s where tools like Agent Identity come in. Permissions will need to be tightly managed. Role based access control (RBAC) will be essential for this agent buildout. But things are made even more challenging because most agents are ephemeral. RBAC systems were built for employees, and perhaps contractors, not agents you spin up and dispose of in minutes. New scale challenges everywhere you look.</p>
<p>Simon Willison coined the phrase <a href="https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/">the lethal trifecta</a>:</p>
<ul>
<li>Access to your private data—one of the most common purposes of tools in the first place!</li>
<li>Exposure to untrusted content—any mechanism by which text (or images) controlled by a malicious attacker could become available to your LLM</li>
<li>The ability to externally communicate in a way that could be used to steal your data (I often call this “exfiltration” but I’m not confident that term is widely understood.)</li>
</ul>
<blockquote>
<p>“If your agent combines these three features, an attacker can easily trick it into accessing your private data and sending it to that attacker.”</p>
</blockquote>
<p>Threats are growing all the time, partly because the tools are so powerful. Take OpenClaw, for example, designed as an AI personal assistant, arguably the fastest growing open source project in software history. OpenClaw reached 250,000 GitHub stars in about 60 days, surpassing React, which took over a decade to reach the same milestone. OpenClaw drove a spike in Mac mini sales as developers clamoured to use the software with a modicum of safety. The project went from launch to explosive growth to acquisition by OpenAI in about three months. Peter Steinberger, the lead developer, is now at the forefront of the movement to create agents that can do a range of tasks on behalf of the individual. For now it’s a developer play, but in the longer term it will be for enterprise users as well. It is a capable autonomous agent, built for messaging and integration to third party systems. But deploying it on your corporate network without a huge amount of supporting security infrastructure would be extremely foolish. It’s definitely not ready for the enterprise at this point. It’s simply too powerful as a lethal trifecta accelerant.</p>
<h1><strong>Agent sprawl drives the need for orchestration platforms</strong></h1>
<p>Orchestration platforms are increasingly a necessity because software developers are using teams of autonomous agents, sometimes working in parallel, but requiring asynchronous communication for handoffs and state management (memory) in the service of tasks and workflows.</p>
<p>This pattern reminds me of the evolution of containers and microservices. The container revolution began with devs running Docker locally on their laptop. As complexity compounded we needed an orchestration layer to manage all of these containers in production, which is how Kubernetes came into being. We’re now at a similar point with agents.</p>
<h1><strong>Agents as the primary driver of scale</strong></h1>
<p>Agents also drive much higher scale requirements. Look around the industry and you can see existing players and startups crushing it, or being crushed by, agent generated workloads. GitHub’s reliability issues are being exacerbated by agent-based software growth. There’s just so much software being built. Just look at Jarred Sumner of OpenAI, who just rewrote the entire Bun Javascript runtime in Rust, replacing the Zig implementation. One million lines of Rust code, all AI-generated. Regenerated is probably a better word than rewritten. Open Source projects are like Doctor Who now. Across the industry there are thousands of Sumners, also building huge code bases and new applications at an entirely new speed.</p>
<h1><strong>Token Economics and the Full AI Stack</strong></h1>
<p>Being an end to end stack player matters because it means you’re in control of costs of goods sold. You’re not reliant on third parties to serve customers, which is a clear economic advantage.</p>
<p>Control of cost is so important because the future of AI is going to be defined by token economics &#8211; that is, who can offer the greatest range of capabilities at the lowest cost. Anthropic and OpenAI are both going to raise prices in order to meet their sky high valuations. In recent months it’s become clear that the cheap token era is ending, even as some developers crow about “tokenmaxxing” as a badge of honour or productivity.</p>
<p>OpenAI and Anthropic need third parties to provide chips &#8211; NVIDIA. They also need cloud providers to provide compute, network and storage &#8211; business for the hyperscalers, as well as Coreweave and to some extent racle. Meanwhile AWS, Azure and GitHub are reliant on OpenAI and Anthropic (and their eye-watering balance sheets) to provide frontier models and associated services to their customers.</p>
<p>Owning a capable first party model is a crucial competitive advantage. At Next 2026 Google Cloud CEO Thomas Kurian announced that Google Gemini would be powering the next generation of Apple&#8217;s Siri service &#8211; see my <a href="https://redmonk.com/jgovernor/google-gemini-to-run-ai-services-for-all-the-phones/">post</a> about it here. Gemini is now powering both Siri and AI services on Android phones worldwide — an almost incredible cloud win, given it covers most new phones being sold in the two key mobile ecosystems. Neither AWS nor Microsoft Azure could have won this deal given their reliance on third-party models.</p>
<p>What about the lower level stuff? Google has its TPU microprocessor architecture, which means that it’s not reliant on NVIDIA. AWS and Microsoft are investing in their own architectures optimised for model training. AWS will likely get there with Trainium, given the engineering process of its chip business &#8211; Graviton for general compute has been, and will continue to be, a huge differentiator for AWS as it seeks to lower the cost of compute for customers.</p>
<p>Google labelled its hardware advantage the AI Hypercomputer but we’ll largely stick to the agent story for the purposes of this post. It’s enough to say that AI is very specific in terms of compute patterns, and that’s why being in control of technology up and down the stack matters. And we haven&#8217;t even talked about the fact that Google Cloud also has a huge installed base of customers using its productivity tools. There is so much upside potential for integrations there.</p>
<p>But let’s cut to the chase. Enterprises are not going to take seriously the proposition that they should spend thousands of dollars per week per developer or agent. That’s not going to fly in Illinois, let alone Nairobi, Paris or Jakarta. I am old enough to remember enterprises balking at paying an extra $30 per calendar month per user for GitHub Copilot. So thousands of dollars per developer just seems unlikely at this point. Model choice is going to become more important, in this environment.</p>
<h1>Data, Context and the new Knowledge Management</h1>
<p>Data was the first business that Google Cloud landed in the enterprise, with companies making deep and strategic bets on Big Query, seeing Google as their “data cloud”. This beachhead is now increasingly realising gains in the AI space.</p>
<p>Google is world class at managing, and managed databases &#8211; Spanner is a globally distributed database offering near absolute transaction guarantees. It’s unique. It’s also increasingly appropriate for AI workloads, with multimodel capabilities including Spanner Graph, vector search and built-in full-text search. Google also offers Postgres in a couple of flavours, and Valkey for caching. You can have CloudSQL for MySQL if you’re looking for it, and for those so inclined of course you can get managed Oracle databases.</p>
<p>So what did Google announce at Next in the data space, to build on these strengths?</p>
<p>The key announcements were about what Google is calling the Agentic Data Cloud &#8211; key components of which are Knowledge Catalog, the cross-cloud Datalake, and the Agent Data Kit (meeting devs where they are, ensuring that Claude Code, Codex etc can do a solid job of accessing Google data services).</p>
<p>The Knowledge Catalog is essentially an AI-maintained index of all your enterprise data, structured so agents can actually use it. This is a modern, AI-inflected take on the classic Knowledge Management projects of the early web era.  We&#8217;re talking about Ontology with a capital O again.</p>
<p>Yasmeen Ahmad, managing director of Agentic Data Cloud at Google Cloud, is one of the best communicators in the industry. She introduced the data infrastructure section of the keynote, bringing the announcements to life with&#8230; Frozen Yoghurt. Yep &#8211; not a holiday booking, or revamping a camping company e-commerce site, but Froyo.</p>
<p>When you’re a food retailer you want to make sure you can meet people’s dietary needs &#8211; gluten and dairy free are both growth markets. Allergy information however might all be in a bunch of PDFs &#8211; it’s classic unstructured data, or what Ahmad calls dark data.</p>
<p>So what if the system identifies potential allergens? That’s new knowledge you can take advantage of for product management. So AI has enriched the knowledge catalog. But then you want to correlate that with a bunch of customer information about allergies. That might be in your system of record. But what if that database was hosted in AWS?</p>
<p>With Google’s Cross Cloud Datalake you can access data directly in third party clouds, using the Apache Iceberg standard. Finally you want to query the data before making a new product decision, so you use your favourite agent, and it’s ready to provide the correct answers based on your enterprise data via the Agent Data Kit. Great demo.</p>
<p>So the idea is that first you build a Knowledge Catalog. You don’t even build it. AI does.</p>
<p>I talked to Andi Gutmans, who runs Google’s data infrastructure and storage businesses. He said that humans are not going to be able to scale to the amount of data and events we’re talking about. Ontology is going to have to be machine learning based. Given the failure of Ontology efforts in corporations over the last few decades, partly because it’s just a hassle to annotate enterprise data with technologies like RDF, it would be hard to argue with him. Ontology, he argued, is actually an agent problem, so he has put together an “AI forward team to work on enrichment”.</p>
<p>The product design goal is essentially smart storage &#8211; a file lands, it’s instantly tagged, enriched and made agent ready. Every file or interaction should help the model learn about your business semantics. That’s where valuable context comes in. Google cited Virgin Media O2 as an early customer of the Knowledge Catalog, which it’s using across 20k different assets.</p>
<p>Of course you need a really good search engine on top of that ontology. Apparently Google has some proprietary search technology that is quite good.</p>
<p>We keep coming back to this question of human scale vs the scale that agents are going to drive. What does it look like to run an infrastructure business when a swarm of agents at a customer are going to spin up a hundred thousand databases in an hour and tear them all down again?</p>
<p>This is key to how Gutmans is architecting Google’s data products &#8211; agents take action, they don’t just make enquiries. Gutmans said Google Cloud is therefore now working towards an expectation of customers managing zettabytes rather than exabytes. Agents driving unprecedented data scale, just as they drive the need for compute, and transactions.</p>
<h1>Rebuilding in flight</h1>
<p>One thing I can’t stop thinking about that Andi told me:</p>
<blockquote>
<p>&#8220;Now &#8211; I need to rebuild a lot of stuff we built a year ago.&#8221;</p>
</blockquote>
<p>That&#8217;s the point. Everything &#8211; literally everything, is now about rebuilding in flight. You need to update your priors every day with AI, agent, and LLM progress, and that often means restarting projects because they simply don&#8217;t meet the new requirements. For an enterprise software business this is super hard. There are new product management disciplines required when the targets are moving so fast. This is going to find out a lot of companies, which are going to struggle with this rate of change and how to manage it. The days of ship it and then run it for the next decade are behind us. And if you don&#8217;t have an incredibly solid foundational platform, making these changes is all the more risky.</p>
<p>Related to this is the need to meet customers where they are. If enterprises thought shadow IT was hard, just wait til they start getting their heads around the new rates of tool adoption.</p>
<p>In a conversation at Next Peder Ulander, VP of marketing said:</p>
<blockquote>
<p>“Years ago you could say here is an enterprise tool, and here is what you use at home.<br />
It used to be &#8211; find a tool you like to work with, and maybe bring it into work two years later. Now it’s now the next year, it’s the next day.”</p>
</blockquote>
<p>That’s the reality, and it’s a reality that agents are only accelerating. The tools are more capable than ever, the opportunities to improve business processes using autonomous agents is increasingly clear. People aren’t going to wait for traditional enterprise sales cycles to start using the tools. Vendors are going to need to respond with strong product led growth plays from the bottom up, supported with the usual large deal size top down procurement. The market is moving incredibly quickly, and Google Cloud is very well positioned for the agent era.</p>
]]></content>
            </entry>
                        <entry>
                <title><![CDATA[Open and Closed: The Pursuit of Frontier Models]]></title>
                <link href="https://redmonk.com/sogrady/2026/05/15/open-ai-models/" />
                <published>2026-05-15T17:06:36Z</published>
                <content type="html"><![CDATA[<p>In the beginning, software was open. Not because that was thought to be the correct strategic approach, but rather because <a href="https://redmonk.com/sogrady/2011/05/24/the-age-of-data/">software was an afterthought</a>. Hardware was what mattered. Less than two decades later, the hardware was cheaper and consequently mattered less. In search of greater returns on capital, the focus swung back to software. To maximize those returns, software was turned from open to closed.</p>
<p>Ever since, software has been in constant tug of war between open and closed. With operating systems, virtualization software, mobile and other categories, closed software led the way and open gave chase. For big data, containers, programming languages and web servers, the roles were reversed. Open source typically led, while closed and proprietary models have had to keep up.</p>
<p>Models, for all that they are built and utterly dependent on a foundation of open source, are very much in the former camp. What <a href="https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf">began open</a> became closed. “Frontier” models &#8211; which is to say models that push the “frontier” &#8211; are universally proprietary or closed, OpenAI’s name notwithstanding.</p>
<p>Ever since Chat-GPT was unleashed on the world on the 22nd of November, 2022, however, there have been &#8211; inevitably &#8211; efforts to counter the dominance of proprietary models with “open” alternatives (we’ll come back to what open means in this context). The technology industry has a long history of both dominant players, and federated resistance to those dominant players.</p>
<p>At a recent industry event, an AI executive likened the open models chasing their closed frontier counterparts to a “pack of wolves.” Casual observers of the industry could be forgiven for not knowing open alternatives existed, because almost all of the media’s attention is consumed by coverage of Anthropic and OpenAI&#8217;s latest achievements &#8211; though arguably that is in part because of the latter’s notable tendency to strategically time its releases around Google AI announcements to minimize their impact.</p>
<p>Whether open will compete with closed, then, is not the interesting question. It always has, it always will. The question to ask is instead: how well? Put another way, will the “pack of wolves” ever catch their prey, and if so, how quickly?</p>
<p>Evaluation of AI models is challenging for many reasons. Anecdotal experimentation is useful &#8211; anyone who used models before and after November of last year would be struck by the difference in capability &#8211; but it obviously doesn’t scale. The only real standardized quantitative measurement available, however, is industry benchmarks.</p>
<p>Given that benchmarks were gamed almost to the point of irrelevance decades ago during the TPC-C wars, they would not otherwise be the first choice for evaluation, but at this point they are the least worst method of measuring performance model vs model.</p>
<p>That being said, there are many other specific concerns for benchmarks generally and those selected here. Among them:</p>
<ol>
<li><strong>Contamination</strong>: models can be trained on data that includes benchmark test questions, either accidentally or deliberately. </li>
<li><strong>Self-Reporting</strong>: models are typically self-reported by the labs that created them. </li>
<li><strong>No Standardized Approach</strong>: benchmark scores can vary widely depending on scaffold, prompt, number of attempts, etc, and benchmarks typically don’t standardize the approach</li>
<li><strong>Specificity</strong>: as will be seen momentarily, benchmarks typically have a specific area of focus. None can adequately cover or represent the breadth of actual real world use cases, and notably the benchmarks here are text in, text out &#8211; not multi-modal.</li>
<li><strong>Difficulty</strong>: to measure progress over time, the benchmarks selected for this analysis had to have actual history. This means that more difficult or challenging benchmarks that have emerged more recently and may be more strenuous tests of ability are not represented here because they would not reveal any real trendlines worth noting. </li>
</ol>
<p>In addition to those caveats, it’s important to note that there are dozens of potential benchmarks &#8211; some general, some specialized &#8211; that could be used. The selection process here prioritized consistently available scores across a wide variety of models and a reasonable history to evaluate. This, in other words, is a snapshot of benchmarks and other selections might produce different results.</p>
<p>One last necessary clarification before proceeding is the definition of “open.” This analysis includes both closed and open models. Closed is closed, but open includes two distinct subsets of models: open weight, and fully open. Fully open refers simply to models that are licensed according to a known and OSI-approved open source license: Apache, MIT, etc. “Open weight,” on the other hand, refers to the emerging <a href="https://www.linkedin.com/feed/update/urn:li:activity:7402729193228324864/?originTrackingId=bagLRp%2FvRfOk8qseYxY0%2FA%3D%3D">industry consensus</a> term for models that are <em>mostly</em> open, but include some restrictions on use that prevent them from being called open source &#8211; the most common example of which in this dataset is Llama.</p>
<p>With that context out of the way, let’s start with a simple glossary of the benchmarks selected.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/05/00_benchmark_glossary_wm.png"><img loading="lazy" decoding="async" src="https://redmonk.com/sogrady/files/2026/05/00_benchmark_glossary_wm-1024x717.png" alt="" width="1024" height="717" class="aligncenter size-large wp-image-6163" /></a></p>
<p>Notably, the benchmarks here are arranged in an order of most to least “saturated.” Saturated refers to benchmarks that have effectively been solved by all or most models, and thus are no longer useful at measuring relative capabilities. In spite of their lack of utility today, saturated benchmarks are included in this analysis because they demonstrate the historical progress open models have made in catching up with their proprietary counterparts.</p>
<p>We’ll begin by examining one of these fully saturated benchmarks, GSM8K.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/05/gsm8k_timeline_wm.png"><img loading="lazy" decoding="async" src="https://redmonk.com/sogrady/files/2026/05/gsm8k_timeline_wm-1024x809.png" alt="" width="1024" height="809" class="aligncenter size-large wp-image-6164" /></a></p>
<p>From GPT-3.5’s performance in December of 2022, within 16 months GSM8K’s grade school math problems were effectively solved. And importantly, by both open and closed models. By late 2024, fully open Deepseek effectively matched Claude Sonnet’s ~96% score. It’s also notable that the 7B Llama released in July of 2023 was basically guessing at 15%, but the 8B Granite model released in May of 2026 was at 93% &#8211; meaning that even small models performance was improving rapidly.</p>
<p>Next, we’ll look at a slightly less saturated benchmark, HumanEval.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/05/03_humaneval_timeline_wm.png"><img loading="lazy" decoding="async" src="https://redmonk.com/sogrady/files/2026/05/03_humaneval_timeline_wm-1024x809.png" alt="" width="1024" height="809" class="aligncenter size-large wp-image-6169" /></a></p>
<p>The “Pass@1” in the above means the model only gets one shot at the question, and it’s excluding other related benchmarks like LiveCodeBench. Again, we see the same pattern playing out, with both open and closed models alike largely solving HumanEval, though the scores are slightly lower than GSM8K.</p>
<p>Also worth noting is that the 30B Granite 4.1 matches the 405B Llama 3.1 from two years ago, proving that open but restricted license models are not outperforming purely open alternatives &#8211; regardless of size.</p>
<p>Slightly less saturated than HumanEval is MMLU.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/05/02_mmlu_timeline_wm.png"><img decoding="async" loading="lazy" src="https://redmonk.com/sogrady/files/2026/05/02_mmlu_timeline_wm-1024x809.png" alt="" width="1024" height="809" class="aligncenter size-large wp-image-6170" /></a></p>
<p>Smaller models aren’t faring as well with the broader set of 14,000 questions: 7B Mixtral represents the peak a bit over 70% and that hasn’t been exceeded since. The larger tier, 70B+, has for its part stalled around a GPT-4o level of capability. The larger open models like Deepseek have peformed well, though nothing close to the closed Opus 4.6.</p>
<p>It’s also worth noting while open weight models lead the way performance-wise, fully open models follow quickly after and now claim the highest scores.</p>
<p>Next up, MATH.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/05/02b_math_timeline_wm.png"><img decoding="async" loading="lazy" src="https://redmonk.com/sogrady/files/2026/05/02b_math_timeline_wm-1024x809.png" alt="" width="1024" height="809" class="aligncenter size-large wp-image-6162" /></a></p>
<p>This is whether things begin to separate. The frontier models score around 90%, while the best open models tap out at 83%. Some of this admittedly might be an artifact of the fact that newer models are more commonly reporting against the AIME and MATH-500 benchmarks rather than classic MATH. Two other things to note: model size doesn’t seem to play a major role in performance, and the older fully open Qwen model still outperforms newer open weight Llama alternatives.</p>
<p>Now we’ll see even more separation in GPQA Diamond.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/05/03b_gpqa_timeline_wm.png"><img decoding="async" loading="lazy" src="https://redmonk.com/sogrady/files/2026/05/03b_gpqa_timeline_wm-1024x809.png" alt="" width="1024" height="809" class="aligncenter size-large wp-image-6165" /></a></p>
<p>There are a number of interesting takeaways here.</p>
<p>For one, Deepseek R1 was ahead of all models, open or closed, when it debuted. But closed models made a big jump in the form of Gemini a few months later, and it took almost a year for open models to close the gap. Earlier this year, Deepseek, GLM, Kimi and Qwen have approached the performance of Anthropic and OpenAI, but not quite matched it.</p>
<p>Lastly, let’s look at SWE-bench Verified &#8211; 500 human-validated real GitHub issues.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/05/07_swe_bench_timeline_wm.png"><img decoding="async" loading="lazy" src="https://redmonk.com/sogrady/files/2026/05/07_swe_bench_timeline_wm-1024x809.png" alt="" width="1024" height="809" class="aligncenter size-large wp-image-6171" /></a></p>
<p>From May of last year through early this year, all progress came from closed models. In February, however, things began to speed up. Both open and closed &#8211; Gemini, GLM, Kimi, MiniMax, Opus, Sonnet, etc &#8211; models all landed within 73-81%. Opus 4.7, for all of its other launch issues, jumped to ~88%, while the Deepseek V4 Pro leads the open contingent at ~81%.</p>
<p>The pattern here is clear and consistent: closed leaps forward, open is hot on its heels. And the cycle appears to be getting faster.</p>
<p>To explore that, let’s look at the time it took open models to match the capabilities of saturated benchmarks we examined earlier.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/05/01a_commoditization_lag_traditional_wm.png"><img decoding="async" loading="lazy" src="https://redmonk.com/sogrady/files/2026/05/01a_commoditization_lag_traditional_wm-1024x809.png" alt="" width="1024" height="809" class="aligncenter size-large wp-image-6167" /></a></p>
<p>It took 18 months for Qwen to match GPT-4’s capabilties within the MMLU benchmark, and 13 months for Llama to do the same for HumanEval and MATH.</p>
<p>The longest it took an open model to match GPT-4o’s capabilities on any of those benchmarks, however, was seven months, and Llama matched its peformance on HumanEval in two.</p>
<p>But what about the harder, non-saturated benchmarks?</p>
<p><a href="http://redmonk.com/sogrady/files/2026/05/01b_commoditization_lag_agentic_wm.png"><img decoding="async" loading="lazy" src="https://redmonk.com/sogrady/files/2026/05/01b_commoditization_lag_agentic_wm-1024x809.png" alt="" width="1024" height="809" class="aligncenter size-large wp-image-6166" /></a></p>
<p>It’s more of the same. Deepseek caught up to Opus 4.6’s capabilities on GPQA in three months, and MiniMax did the same on SWE-Bench in one. None have matched Opus 4.7 as yet, but it’s been less than a month.</p>
<h1>Takeaways</h1>
<p>There are any number of different conclusions to be drawn from this dataset &#8211; with the caveats noted above &#8211; but here are five that stand out.</p>
<ol>
<li>Closed models are setting the pace of innovation, and constantly breaking new ground from a capabilities standpoint.</li>
<li>Open models are chasing them, and the cycle times seem to be getting shorter. There are no clear capability moats, and what is frontier today is table stakes tomorrow.</li>
<li>Closed beats open today, but there is effectively no advantage to restricted open weight vs fully pen models. </li>
<li>Small models are extremely competitive in specialized disciplines, but lag behind on general performance.  </li>
<li>The United States has the largest contingent of surveyed models (42), and the largest proportion of closed models (64%). China, by contrast, features 17 models, and every single one is either open weight or fully open. </li>
</ol>
<p>Having performed this base level analysis, it will be necessary to track how these models continue to evolve and how the benchmarks evolve with them.</p>
]]></content>
            </entry>
                        <entry>
                <title><![CDATA[Google Gemini to run AI services for all the phones]]></title>
                <link href="https://redmonk.com/jgovernor/google-gemini-to-run-ai-services-for-all-the-phones/" />
                <published>2026-05-14T14:16:22Z</published>
                <content type="html"><![CDATA[<p><img decoding="async" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/GCN26_102_BlogHeader_2436x1200_Opt_4_Dark.max-2500x2500.jpg" alt="https://storage.googleapis.com/gweb-cloudblog-publish/images/GCN26_102_BlogHeader_2436x1200_Opt_4_Dark.max-2500x2500.jpg" /></p>
<p>During his Google Cloud Next 26 keynote CEO Thomas Kurian said:</p>
<blockquote>
<p>&#8220;We&#8217;re collaborating with Apple as their preferred cloud provider to develop the next generation of Apple Foundation Models based on Gemini technology. These models will now power future Apple Intelligence features including a more personalized Siri coming later this year.&#8221;</p>
</blockquote>
<p>This is a big deal. This news didn&#8217;t come from Apple, which is notoriously tight lipped about such things. There wasn&#8217;t a press release about it. But assuming it&#8217;s true, and there absolutely no reason to doubt Kurian, then its worth considering what it means.</p>
<p>Pretty much every phone on the planet is going to be powered by Google Gemini AI going forward &#8211; certainly Android services will be, and this now bring iPhones into the mix. This is an almost absurdly great position for Google to be in, as AI moves out to the edge.</p>
<p>Every phone on the planet. That&#8217;s potentially billions of devices.</p>
<p>As we know, model preferences are far from sticky. They change day to day. Who knows what Apple will be using in 2027 or 2028? On the other Apple building its own frontier model based on Gemini &#8211; that does feel kind of sticky. It certainly took a while for Apple to make Maps competitive, for example, after its initial use of Google Maps. Google Search is still the default on Apple devices (to be fair Google pays for this privilege, but I believe the point stands.) This is a landmark win for Google Cloud.</p>
<p>There are all sorts of questions &#8211; not least architectural. Will Siri be running AI with local models, maybe something like Google Gemma, or is it about Siri&#8217;s back end running on Google Cloud with Gemma. Is this actually a client side win, or a cloud win. Given Apple&#8217;s strict approaches around privacy one assumes there will be a local component. Plenty to unpick though.</p>
<p>Model choices change quickly, but rolling these changes out at Apple ecosystem scale would be no mean feat. A ton of things would break.</p>
<p>Nobody chooses a model for life. But for this era of AI service buildout Gemini powering Siri is very notable. It&#8217;s a huge cloud win whichever way you look at it, and frankly neither AWS or Microsoft Azure could have done so, if it was based on owning a competitive frontier model. Unlike Google they lead with third party models.</p>
<p>I will be watching Google I/O closely next week to put this into further context.</p>
<p>&nbsp;</p>
<p>disclosure: Google Cloud is a client, and paid for my T&amp;E to the event.</p>
]]></content>
            </entry>
                        <entry>
                <title><![CDATA[License Distribution on Hugging Face]]></title>
                <link href="https://redmonk.com/sogrady/2026/05/12/hugging-face-licensing/" />
                <published>2026-05-12T14:10:00Z</published>
                <content type="html"><![CDATA[<p>While it’s been almost eighteen months since the OSI released its open source AI definition, the debate around where, whether and how open source licenses might be applied to AI models continues. The view here <a href="https://redmonk.com/sogrady/2024/10/22/from-open-source-to-ai/">remains</a> <a href="https://www.linkedin.com/feed/update/urn:li:activity:7402729193228324864/?originTrackingId=bagLRp%2FvRfOk8qseYxY0%2FA%3D%3D">unchanged</a>, which is that open source should not be applied to AI, but the industry more broadly has not yet reached a consensus.</p>
<p>Unless and until that occurs, then, it is useful to understand how open source licenses are being applied to models and in what proportions. To do this, inspired by a conversation on this subject yesterday, the approximately ~2.9M existing Hugging Face models were scanned for license information. There are some interesting takeaways from the data, but first it is worth noting that there are inherent issues with it.</p>
<ul>
<li>First, this analysis cannot account for licensing issues like an illegally licensed project. There are models, for example, that apply an Apache license but trained using Llama. That is not permissible, as you may not convey rights you yourself do not have &#8211; particularly if you’re performing actions a given license specifically and explicitly prohibits. This analysis would consider the project Apache-licensed, when in reality that license cannot not be applied.</p>
</li>
<li>
<p>Second, this analysis can’t meaningfully discuss every one of the new licenses here, some of which were written by professionals and some of which were self-evidently not. Given the scale, it cannot guarantee that there are no falsely categorized licenses. The some Kimi and Mistral projects, for example, use a “Modified MIT License,” but the modifications make the project neither open source nor MIT. The good news is that projects using this license are typically categorized as “Other,” and are therefore properly not counted as open source here. The bad news is that we can’t guarantee that there might not be exceptions.</p>
</li>
<li>
<p>Lastly, as will be discussed in more detail shortly, there is a rather large hole in the data.</p>
</li>
</ul>
<p>With those warnings out of the way, we’ll start with the Top 20 licenses on Hugging Face.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/05/hf_top20_licenses_wm.png"><img loading="lazy" decoding="async" src="https://redmonk.com/sogrady/files/2026/05/hf_top20_licenses_wm-1024x911.png" alt="" width="1024" height="911" class="aligncenter size-large wp-image-6156" /></a></p>
<p>This is a classic long tail distribution, but it is notable just how popular the Apache license is with ~2.5X more licensed projects than its nearest competitor, the MIT license. The next most popular licensing family, perhaps not surprisingly given its Hugging Face pedigree, is Open &amp; Responsible AI Licenses (OpenRAIL). It is the largest single non-OSI, non-model-specific license category in this dataset.</p>
<p>Next, let’s look at the distribution of projects by category.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/05/hf_license_categories_wm.png"><img loading="lazy" decoding="async" src="https://redmonk.com/sogrady/files/2026/05/hf_license_categories_wm-1024x809.png" alt="" width="1024" height="809" class="aligncenter size-large wp-image-6160" /></a></p>
<p>This is the hole in the dataset mentioned above. Just as GitHub historically reported that the overwhelming majority of repositories are unlicensed, almost seventy percent of Hugging Face models carry no license at all. This means that the data represented here about licensing distribution covers only a third of the models, though at almost a million the sample size is still large enough to be meaningful.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/05/hf_licensed_vs_unlicensed_wm-scaled.png"><img loading="lazy" decoding="async" src="https://redmonk.com/sogrady/files/2026/05/hf_licensed_vs_unlicensed_wm-1024x576.png" alt="" width="1024" height="576" class="aligncenter size-large wp-image-6158" /></a></p>
<p>To better understand the distribution, let’s strip out the unlicensed projects and look at only those with one.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/05/hf_licensed_categories_wm.png"><img decoding="async" loading="lazy" src="https://redmonk.com/sogrady/files/2026/05/hf_licensed_categories_wm-1024x809.png" alt="" width="1024" height="809" class="aligncenter size-large wp-image-6159" /></a></p>
<p>Notably, as was seen in our <a href="https://redmonk.com/sogrady/2026/03/25/open-source-licensing-2026/">recent look</a> at the state of open source software licensing, Hugging Face is displaying a systemic preference for permissive licensing models. If anything this data is under-representing the preference for permissive-style licenses, because while OpenRAIL licenses cannot be considered Permissive in the OSI-approved sense, in spirit they embody many of the same values.</p>
<p>Lastly, given the OSI context above, was to look at the specific breakdown of licensed projects carrying an OSI license vs an unapproved alternative.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/05/hf_osi_licensed_wm-scaled.png"><img decoding="async" loading="lazy" src="https://redmonk.com/sogrady/files/2026/05/hf_osi_licensed_wm-1024x576.png" alt="" width="1024" height="576" class="aligncenter size-large wp-image-6157" /></a></p>
<p>Happily, among licensed projects, better than two thirds carried an OSI-approved license. We may not yet understand the exact role that open source will play within AI, but until there’s clarification a license style with a clear and unambiguous definition is proving to be the most popular choice &#8211; in spite or perhaps because of the contrasting styles of licenses that category contains.</p>
]]></content>
            </entry>
                        <entry>
                <title><![CDATA[AI Slop &#038; the Vulnerability Treadmill]]></title>
                <link href="https://redmonk.com/kholterhoff/2026/05/05/ai-slop-vulnerability-treadmill/" />
                <published>2026-05-05T13:56:42Z</published>
                <content type="html"><![CDATA[<p><img decoding="async" class="alignnone size-full wp-image-471" src="https://redmonk.com/kholterhoff/files/2026/05/CorgiTreadmill.gif" alt="" width="100%" height="317" /></p>
<p>It has not been a relaxing few months for software security teams.</p>
<p>In December, React disclosed its<a href="https://react.dev/blog/2025/12/03/critical-security-vulnerability-in-react-server-components"> first critical CVE</a>: an unauthenticated remote code execution flaw in Server Components. In March, not only was Aqua Security&#8217;s Trivy, a widely-used security scanning tool, <a href="https://socket.dev/blog/trivy-docker-images-compromised">compromised twice in three weeks</a> through a GitHub Actions misconfiguration, but hackers also compromised a maintainer account for the Axios npm cURL package in order to publish <a href="https://www.axios.com/2026/03/31/north-korean-hackers-implicated-in-major-supply-chain-attack">backdoored versions</a> containing a cross-platform remote access trojan that silently exfiltrated credentials. In April, Vercel disclosed a<a href="https://vercel.com/kb/bulletin/vercel-april-2026-security-incident"> security incident</a> originating from a compromised third-party AI tool, Context AI, used by an employee that gave attackers access to customer environment variables.</p>
<p>Many in the vulnerability space lay the blame for these cascading incidents squarely on AI—and they&#8217;re not wrong, though the story is more complicated than &#8220;AI bad.&#8221; AI-generated code is letting vulnerabilities slip into production at an alarming rate. Researchers at Georgia Tech&#8217;s<a href="https://news.research.gatech.edu/2026/04/13/bad-vibes-ai-generated-code-vulnerable-researchers-warn"> Vibe Security Radar</a> tracked CVEs directly attributable to AI coding tools and found that March 2026 alone produced more than all of 2025 combined. AI tools are also allowing bad actors to infiltrate systems in new and creative ways. The Axios npm compromise wasn&#8217;t a brute-force attack—it was &#8220;<a href="https://www.theregister.com/2026/04/11/trivy_axios_supply_chain_attacks/">AI-enabled social engineering</a>.&#8221; AI allows attackers to mount more elaborate and convincing campaigns against open source maintainers, while simultaneously flooding the ecosystem with code that is outpacing security teams&#8217; ability to keep up.</p>
<p>In my<a href="https://redmonk.com/kholterhoff/2026/02/03/ai-slopageddon-and-the-oss-maintainers/"> previous post on the AI Slopageddon</a> I covered the contribution quality crisis. AI-generated pull requests are overwhelming maintainers. The social contract between contributors and projects is breaking down. And AI platforms are making it worse.</p>
<p>This piece is a companion. It&#8217;s about the state of supply chain security in the age of AI. We look at slop vulnerability reports, a CVE database that may already be too slow to matter, and a software supply chain with security defaults designed for a world that no longer exists.</p>
<p>&nbsp;</p>
<h2>Meh-trics</h2>
<p><img loading="lazy" decoding="async" class="alignnone size-full wp-image-468" src="https://redmonk.com/kholterhoff/files/2026/05/RachelStephensLaptop-scaled.jpeg" alt="" width="2560" height="1920" /><br />
<sub>Rachel Stephens&#8217;s laptop boasts a &#8220;Meh-trics&#8221; sticker.</sub></p>
<p>At RedMonk, we talk about metrics a lot, and often with suspicion. We&#8217;ve watched an entire ecosystem spring up around the monetization of perceived value in software—<a href="https://redmonk.com/jgovernor/so-where-all-the-github-link-farms-at-astro-turfing-in-software-development/">GitHub stars</a>, contribution graphs, download counts, bounty payouts—proxy after proxy, each one promising to measure something real about the health and quality of a project, and each one an invitation to be gamed. Gaming isn&#8217;t new (<a href="https://en.wikipedia.org/wiki/Goodhart%27s_law">Goodhart&#8217;s Law</a>, amirite?). What&#8217;s new is the low cost of doing it well. AI has made it trivially easy to game software industry reward systems without delivering the outcomes they were designed for.</p>
<p>AI has collapsed the effort required to manufacture every signal of credibility the software ecosystem depends on, and the consequences are rippling through every incentive structure the community has built. Bug bounty programs are drowning in AI-generated reports that cost pennies in tokens to produce, but hours of expert time to debunk. Salaried internship programs like the<a href="https://lfx.linuxfoundation.org/tools/mentorship/"> Linux Foundation&#8217;s Mentorship Program</a>,<a href="https://summerofcode.withgoogle.com/"> Google Summer of Code</a>, and<a href="https://www.outreachy.org/"> Outreachy</a> are grappling with a murkier version of the same problem: maintainers I&#8217;ve spoken with are struggling to determine whether participants are actually doing the work or using AI to get their foot in the door without intending to meaningfully contribute afterwards. Even the resume-enhancing cachet of being an open-source contributor, a formerly reliable signal on a junior developer&#8217;s resumes, is losing its value as the cost of faking it approaches zero. AI is hollowing out the systems that once rewarded genuine effort from the inside.</p>
<p>What happens when the signals on which we&#8217;ve built our ecosystem stop meaning what we thought they meant, and what incentive structures we might build instead? To answer these questions, I looked at the vulnerability space, because that&#8217;s where the perverse incentives cut deepest and the money flows most visibly.</p>
<p><img decoding="async" class="alignnone size-full wp-image-466" src="https://redmonk.com/kholterhoff/files/2026/05/MetricsShadeArrestedDevelopment.gif" alt="" width="100%" height="202" /></p>
<p>&nbsp;</p>
<h2>The Black, White, and Gray Markets for CVEs</h2>
<p>Every vulnerability has a price. The question is who is paying, and for what?</p>
<p>There has always been a black market for exploits, so a white market emerged to balance it out (shout out to <a href="https://www.linkedin.com/in/bryanboreham/">Bryan Boreham</a>, Distinguished Engineer at Grafana Labs, for framing it this way in our recent conversation at Monki Gras). For this reason, a sophisticated, tiered global economy has emerged that is willing to pay for exploits.</p>
<p>At the bottom are the outright black market forums where weaponized exploits and stolen credentials trade hands with no pretense of legality. A threat actor claiming affiliation with ShinyHunters<a href="https://www.ox.security/blog/vercel-context-ai-supply-chain-attack-breachforums/"> posted Vercel&#8217;s internal database</a> on BreachForums that included customer API keys, environment variables, portions of source code at an asking price of $2 million. When suspected North Korean hackers from UNC1069 compromised a maintainer account for Axios, and<a href="https://cloud.google.com/blog/topics/threat-intelligence/north-korea-threat-actor-targets-axios-npm-package"> shipped a remote access trojan</a> to every developer who ran npm install during a three-hour window, that was a state-sponsored actor treating the open-source supply chain as an ATM for the North Korean<a href="https://www.cnn.com/2026/03/31/politics/north-korea-hacking-crypto"> missile program</a>.</p>
<p>Above these sit the gray market brokers that purportedly sell to governments and intelligence agencies. Crowdfense<a href="https://www.crowdfense.com/exploit-acquisition-program/"> offers</a> “rewards ranging from USD 10,000 to USD 7 million for full exploit chains or previously unreported capabilities.” A UAE-based startup called Advanced Security Solutions<a href="https://techcrunch.com/2025/08/20/new-zero-day-startup-offers-20-million-for-tools-that-can-hack-any-smartphone/"> launched in 2025</a> offering $20 million for &#8220;tools that can hack any smartphone.&#8221;<a href="https://en.wikipedia.org/wiki/Zerodium"> Zerodium</a>, arguably the most notorious broker in the space, spent years publishing a public price list for zero-days—up to $2.5 million for a full-chain iOS exploit—before quietly<a href="https://en.wikipedia.org/wiki/Zerodium"> going dark in early 2025</a>. These are the gray market players who sell to government and intelligence clients.</p>
<p>The bug bounty, by contrast, is the white market, or maybe more accurately, the counter-market. This time-tested resource at many companies exists to outbid the alternative. The entire premise is that if you pay researchers enough to disclose responsibly, they won&#8217;t sell to Crowdfense or post on BreachForums instead. Last year, HackerOne<a href="https://cdn.pathfactory.com/assets/preprocessed/11231/eaed9b28-8578-44cc-835c-b62c88f259c8/eaed9b28-8578-44cc-835c-b62c88f259c8.pdf"> reported</a> that 83% of surveyed organizations now use bug bounties, with total payouts reaching $81 million across all programs.</p>
<p>But bug bounties have become unsustainable at the exact moment they&#8217;re most needed, and even, in some cases, legally required. Bug bounty programs are being hit hard by AI—slop and non-slop. To begin, AI tools are finding all kinds of bugs.</p>
<p>&nbsp;</p>
<h2>The Arms Race</h2>
<p><img decoding="async" class="alignnone size-full wp-image-472" src="https://redmonk.com/kholterhoff/files/2026/05/epichandshakeaivulns.png" alt="" width="100%" height="698" /></p>
<p>Security people love to call things an &#8220;arms race,&#8221; but in the case of AI the metaphor unfortunately fits. AI is the best weapon in both the attackers and the defenders arsenal, and the balance of power shifts depending entirely on who&#8217;s wielding it and whether anyone is paying them.</p>
<p>On the attack side: the cost of generating exploit code from a published CVE has collapsed just as thoroughly as the cost of generating a junk vulnerability report. The<a href="https://www.wiz.io/blog/six-accounts-one-actor-inside-the-prt-scan-supply-chain-campaign"> prt-scan campaign</a> in March and April 2026 spent six weeks opening hundreds of pull requests against repositories with <code>pull_request_target</code> misconfigurations, rotating through throwaway accounts and using AI-generated, language-appropriate diffs to look like plausible contributions.</p>
<p>On the defense side: AISLE <a href="https://aisle.com/blog/what-ai-security-research-looks-like-when-it-works">reported</a> that their AI system discovered all twelve zero-day vulnerabilities announced in OpenSSL&#8217;s January 2026 security release, including bugs that had lurked undetected for 25 to 27 years, one of which predated OpenSSL itself, inherited from its 1990s predecessor SSLeay. Meanwhile, Anthropic&#8217;s Claude Opus 4.6<a href="https://red.anthropic.com/2026/zero-days/"> found over 500</a> high-severity zero-day vulnerabilities in well-tested open-source codebases without specialized tooling. These projects had fuzzers running against them for years, but the model found what the fuzzers missed. <a href="https://www.lesswrong.com/posts/7aJwgbMEiKq5egQbd/ai-found-12-of-12-openssl-zero-days-while-curl-cancelled-its">According</a> to <a href="https://www.linkedin.com/in/stanislav-fort/">Stanislav Fort</a>, AISLE&#8217;s founder and Chief Scientist:</p>
<blockquote>
<p>AI is simultaneously collapsing the median (&#8220;slop&#8221;) and raising the ceiling (real zero-days in critical infrastructure).</p>
</blockquote>
<p>Google, for its part, is betting that the answer is platform-level defense at machine speed. At<a href="https://cloud.google.com/blog/products/identity-security/next26-redefining-security-for-the-ai-era-with-google-cloud-and-wiz"> Next &#8217;26</a>, the company unveiled what it&#8217;s calling &#8220;<a href="https://cloud.google.com/blog/products/identity-security/next26-redefining-security-for-the-ai-era-with-google-cloud-and-wiz">Agentic Defense</a>&#8220;: a cybersecurity platform that merges Google&#8217;s Threat Intelligence and Security Operations with<a href="https://www.wiz.io/blog/wiz-at-google-cloud-next"> Wiz</a>, the cloud security firm it acquired earlier this year. The headline numbers are sobering context for the investment: Google&#8217;s own<a href="https://cloud.google.com/security/resources/m-trends"> M-Trends 2026 report</a> found that the handoff time between an initial intrusion and a secondary threat actor has collapsed from eight hours to 22 seconds over the past three years (p. 55). At that velocity, human-speed triage is a rounding error.</p>
<p>Perhaps most relevant to the supply chain story, Wiz introduced an<a href="https://www.wiz.io/academy/ai-security/ai-bom-ai-bill-of-materials"> AI-Bill of Materials (AI-BOM)</a> that automatically inventories every AI framework, model, and IDE extension across an environment: a direct response to the shadow AI problem that made the Vercel breach possible, where a single employee&#8217;s use of an unsanctioned AI tool cascaded into a platform-wide compromise.</p>
<p>But the strategic question isn&#8217;t whether AI is good or bad for security. It&#8217;s the incentive structure. Currently, the incentives reward finding over fixing.</p>
<p>&nbsp;</p>
<h2>Project Glasswing and the Bounty Crisis</h2>
</p>
<p>When Anthropic<a href="https://www.anthropic.com/glasswing"> announced Project Glasswing</a> in April, they framed it as a response to a supply chain security crisis that had already arrived. Anthropic committed<a href="https://www.linuxfoundation.org/blog/project-glasswing-gives-maintainers-advanced-ai-to-secure-open-source"> $100 million in usage credits</a> for its Claude Mythos Preview model, along with $4 million in direct donations to open-source security organizations, explicitly to put frontier AI vulnerability detection into the hands of maintainers who otherwise couldn&#8217;t afford it. <a href="https://www.linuxfoundation.org/blog/project-glasswing-gives-maintainers-advanced-ai-to-secure-open-source">According</a> to <a href="https://www.linkedin.com/in/zemlin/">Jim Zemlin</a>, CEO of the Linux Foundation:</p>
<blockquote>
<p>I am optimistic but the urgency is real. We are in the most dangerous period, the transition when attackers might gain a significant advantage as the technology ecosystem digests the impact of AI.</p>
</blockquote>
<p>Assuming we don’t write off Mythos as an elaborate marketing stunt, the subtext of Project Glasswing is that the white market for vulnerabilities—bug bounties, responsible disclosure, coordinated patching—will lose the economic argument without external incentivization. And the evidence for that subtext is already here.</p>
<p>cURL&#8217;s bug bounty program, running since 2019,<a href="https://daniel.haxx.se/blog/2026/01/26/the-end-of-the-curl-bug-bounty/"> found 87 confirmed vulnerabilities</a> and paid out over $100,000. It worked until AI collapsed the cost of submitting garbage while leaving the cost of evaluating it unchanged. Daniel Stenberg, founder and lead developer of cURL, was forced to kill the program this January because his team was spending more time debunking AI-generated reports than writing code.</p>
<p>When I interviewed <a href="https://www.linkedin.com/in/vpetersson/">Viktor Petersson</a>, co-founder of Screenly, he described the same dynamic from the other side. His company has received 331 vulnerability reports since launching its program less than six months ago. Thirty-nine were confirmed vulnerabilities. A huge proportion were duplicates. The volume got heavy enough that they had to build custom internal triage tooling and distribute review across multiple teams because no single person could keep up. After debating whether to kill the program, they decided to keep it—&#8221;it&#8217;s essentially an ongoing free pen test&#8221;—but Viktor was blunt about what he&#8217;s hearing from peers: more programs are going to be shut down. The teams simply can&#8217;t keep up.</p>
<p>The Node.js project tried to address this by imposing a <a href="https://nodejs.org/en/blog/announcements/hackerone-signal-requirement">minimum HackerOne Signal</a> score to submit reports, eventually requiring new researchers to show up in the OpenJS Foundation Slack and talk to a human. It&#8217;s a reasonable filter, but every gate you build to keep slop out also risks keeping legitimate newcomers away. There is no clean solution here, only trade-offs.</p>
<p>Here is the core economic dysfunction: generating a plausible-sounding vulnerability report now costs pennies in tokens. Evaluating whether it&#8217;s real still costs an hour of expert time reproducing the steps, which are probably garbage, but which you can&#8217;t write off because <i>what if</i>.</p>
<p>&nbsp;</p>
<h2>You Can&#8217;t Just Shut It Down</h2>
<p><img decoding="async" class="alignnone size-full wp-image-465" src="https://redmonk.com/kholterhoff/files/2026/05/HopperBug.jpeg" alt="" width="100%" height="600" /><br />
<sub>Grace Hopper&#8217;s &#8220;First actual case of bug being found&#8221;</sub></p>
<p>Bug bounty programs are becoming unsustainable at the exact moment that reporting vulnerabilities is becoming legally required.</p>
<p>The EU<a href="https://digital-strategy.ec.europa.eu/en/policies/cyber-resilience-act"> Cyber Resilience Act</a> takes effect in stages, and the first enforced milestone hits<a href="https://digital-strategy.ec.europa.eu/en/policies/cra-reporting"> September 11, 2026</a>. From that date, all manufacturers of products with digital elements sold into the EU must report actively exploited vulnerabilities to ENISA within 24 hours. You need a vulnerability disclosure program. You need SBOMs. You need continuous monitoring. You need all of this even for legacy products already on the market.</p>
<p>Stenberg could pull the plug on cURL’s bounty because it is a volunteer-maintained project with no fiduciary obligation to a regulator. A company selling software into the EU will not have that luxury come September. They&#8217;ll be legally obligated to maintain exactly the kind of intake mechanism that&#8217;s currently being firehosed with AI slop, and the<a href="https://orcwg.org/cra/"> penalty for non-compliance</a> can run up to €15 million or 2.5% of global annual revenue.</p>
<p>And the CVE database itself? Although roughly<a href="https://www.first.org/blog/20251229-Vulnerability-Forecast-Review"> 50,000 CVEs</a> were published in 2025, up 22% from 2024, some security teams aren&#8217;t relying on lists of CVEs anymore (NIST CVE database, GitHub archived CVE records, Vendor-specific security advisories, OSV.dev, OpenCVE, VulnDB, CISA KEV Catalog) owing to (<a href="https://devansh.bearblog.dev/ai-slop/#the-cve-system-is-collapsing">among</a> <a href="https://techcrunch.com/2026/04/07/cisa-budget-cuts-700-million-cybersecurity-agency-trump/">other</a> <a href="https://www.cisa.gov/known-exploited-vulnerabilities-catalog">things</a>) AI acceleration. By the time a CVE is published, the assumption is it&#8217;s already being exploited in the wild. All you need to do is point an LLM at it to write an exploit.</p>
<p>Some companies have responded by scanning package repositories like PyPI and npm in near real-time, using pattern analysis on code commits to detect compromises before a CVE is even assigned.</p>
<p>The implications for the VEX (Vulnerability Exploitability eXchange) and Common Security Advisory Framework (CSAF) infrastructure are uncomfortable. If the CVE is the lagging indicator everyone assumes it is, then the entire system of databases, scanners, and compliance checklists built on top of it is measuring yesterday&#8217;s weather. And the CRA is about to mandate that companies build their security programs around exactly these lagging indicators. That&#8217;s a perverse incentive of a different sort: regulatory compliance that optimizes for the appearance of security rather than its substance.</p>
<p>&nbsp;</p>
<h2>Paying for the Fix</h2>
<p>The gap between &#8220;finding&#8221; and &#8220;fixing&#8221; is where money actually needs to flow, and a few models are trying to make that happen.</p>
<p>Sonar’s Tidelift pays maintainers directly to implement enterprise-grade secure development practices. Their 2024 survey data found that<a href="https://www.businesswire.com/news/home/20240917030299/en/Tidelift-Study-Reveals-Paid-Open-Source-Maintainers-Do-Significantly-More-Critical-Security-and-Maintenance-Work-Than-Unpaid-Maintainers"> paid maintainers are 55% more likely</a> to implement critical security and maintenance practices than unpaid ones. The mechanism isn&#8217;t complicated, pay people to do the work and more of the work gets done. Less clear is how &#8220;pay the maintainers&#8221; is going to intersect with &#8220;now my open source project has a fiduciary obligation under the CRA.&#8221;</p>
<p>Germany&#8217;s<a href="https://www.sovereign.tech/"> Sovereign Tech Agency</a> has invested<a href="https://www.phoronix.com/news/STF-Two-Years-24.9M-USD"> over €23 million</a> in 60-plus open-source projects since 2022. Its<a href="https://www.sovereign.tech/programs/bug-resilience"> Resilience program</a> takes a notably different approach than traditional bounties: it reduces technical debt first through engineering contributions, then runs bug bounties, and crucially pays bounties to the maintainers who resolve these reported issues. The design was informed by<a href="https://www.sovereign.tech/public/files/Bug-Bounties-and-FOSS-EN.pdf"> Dr. Ryan Ellis&#8217;s research</a> at Northeastern University, which found that bounties can actually undermine security for under-maintained projects by drawing attention and creating financial burdens the project can&#8217;t absorb. A 2025 <a href="https://eu-stf.openforumeurope.org/">feasibility study</a> proposes a pan-European Sovereign Tech Fund with a minimum budget of €350 million building on Germany’s model.</p>
<p>Although models exist and work, they cover only a fraction of the ecosystem while the CRA is about to mandate vulnerability programs for everyone selling into the EU. Who pays for the assessment burden when the report-to-assessment cost ratio is this broken?</p>
<p>&nbsp;</p>
<h2>The Treadmill</h2>
<p><img decoding="async" class="alignnone size-full wp-image-469" src="https://redmonk.com/kholterhoff/files/2026/05/DucklingTreadmill.gif" alt="" width="100%" height="500" /><br />
Here&#8217;s where we are. Reports are cheap. Assessments are expensive. Bug bounty programs are simultaneously being killed and legally mandated. The CVE database may already be too slow to matter. The supply chain has consolidated around a platform whose defaults were designed for a different era. And AI is the best tool on both sides of the fight, with the outcome determined not by the technology but by whether anyone has the budget and the organizational will to use it for defense rather than noise generation.</p>
<p>The answer probably starts with flipping the ratio: making assessment as cheap as generation, paying for fixes instead of just finds, and treating supply chain security as the board-level priority it has been pretending to be. Until then, we&#8217;re on a treadmill that&#8217;s speeding up without an emergency stop button.</p>
<p><b>Disclaimer:</b> GitHub/Microsoft and Google are RedMonk clients.</p>
]]></content>
            </entry>
                        <entry>
                <title><![CDATA[Infrastructure Spend in the AI Era]]></title>
                <link href="https://redmonk.com/sogrady/2026/04/29/infrastructure-spend-in-the-ai-era/" />
                <published>2026-04-29T20:48:11Z</published>
                <content type="html"><![CDATA[<p>At a recent industry event, conversation turned &#8211; as it always seems to these days &#8211; to the economics of the AI buildout. It’s no secret that the AI race has involved massive and escalating capital investments in datacenters and other infrastructure, hardware, power and related cost centers.</p>
<p>For most of the conversation’s participants, however, it had been some time since anyone &#8211; us <a href="https://redmonk.com/rstephens/2016/06/16/infrastructure-investments-by-cloud-service-providers/">included</a> &#8211; had examined the numbers in detail, both for the slope of the trajectory and for the context around the spending itself.</p>
<p>While any analysis of this type is limited by the data that’s available &#8211; large important players like Anthropic and OpenAI for example are, for now, private companies and therefore don’t report their metrics publicly  &#8211; it is nevertheless worth looking at the investments large cloud players have made in recent years, and how they might compare to non-infrastructure centric technology vendors like Apple.</p>
<p>For starters, then, here is the Plants, Property and Equipment (PP&amp;E) spend for the selected vendors over the past decade. As a side note, these PP&amp;E figures exclude operating lease right-of-use (ROU) assets because the point of interest here is actual capital build out.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/04/ppe_absolute_wm.png"><img loading="lazy" decoding="async" src="https://redmonk.com/sogrady/files/2026/04/ppe_absolute_wm-1024x809.png" alt="" width="1024" height="809" class="aligncenter size-large wp-image-6151" /></a></p>
<p>Of note here are the relative rankings in PP&amp;E spend of the investments, as well as the slope pre- and post-ChatGPT release. Additionally, as has been well documented elsewhere, Apple has not felt compelled to respond to this cycle’s frenzied wave of datacenter construction and its PP&amp;E spend has remained static while that of its industry peers has soared.</p>
<p>Next, here’s a look at the changes in PP&amp;E spend per company year on year.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/04/ppe_yoy_growth_wm.png"><img loading="lazy" decoding="async" src="https://redmonk.com/sogrady/files/2026/04/ppe_yoy_growth_wm-1024x809.png" alt="" width="1024" height="809" class="aligncenter size-large wp-image-6152" /></a></p>
<p>The most important note here is to ignore the 2023 spike in Oracle’s PP&amp;E; that’s an artifact of its 2022 $28B acquisition of the electronic health records company Cerner, with its datacenter, infrastructure and facilities hitting Oracle’s books in the following calendar year.</p>
<p>Other than random spikes in investments from Amazon, Meta and others, the only significant takeaway from this chart are the slopes again pre- and post-ChatGPT. Year on year growth in PP&amp;E spend had plateaued and arguably declined heading into 2022, which is appropriate as the cloud market was maturing six years in. But these trends took an about face in the wake of ChatGPT’s breakout success, and increases in PP&amp;E spend immediately accelerated.</p>
<p>Arguably the most startling chart, however, is that of PP&amp;E spend as a percentage of annual revenue.</p>
<p><a href="http://redmonk.com/sogrady/files/2026/04/ppe_pct_revenue_wm.png"><img loading="lazy" decoding="async" src="https://redmonk.com/sogrady/files/2026/04/ppe_pct_revenue_wm-1024x809.png" alt="" width="1024" height="809" class="aligncenter size-large wp-image-6153" /></a></p>
<p>Apple and Meta bookend this chart as the opposite ends of the spending spectrum. But it’s notable how close the trajectories of Amazon and Google, and later Microsoft and much later, Oracle are as a percentage of revenue. It is eye opening that all but one of these companies &#8211; Apple being the notable exception &#8211; are spending at least half of their revenue figure, and most well north of that, on new infrastructure.</p>
<p>That level of investment would have been unthinkable a decade ago. Today, the chart suggests it’s table stakes unless you’re a commercial device retailer.</p>
<p>While PP&amp;E spend is too blunt an instrument to perform a much more detailed analysis, it both points to the extreme level of investment required to be regarded as a credible player and raises significant questions about where, when and how the return on these outsized investments will arrive.</p>
<p><strong>Disclosure</strong>: Amazon, Google, Microsoft and Oracle are RedMonk customers. Apple and Meta are not currently customers.</p>
]]></content>
            </entry>
                    </feed>
        