{"id":877,"date":"2024-12-23T11:42:05","date_gmt":"2024-12-23T18:42:05","guid":{"rendered":"https:\/\/redmonk.com\/rstephens\/?p=877"},"modified":"2024-12-23T11:42:05","modified_gmt":"2024-12-23T18:42:05","slug":"hypothesis-around-the-pricing-of-agentic-ai","status":"publish","type":"post","link":"https:\/\/redmonk.com\/rstephens\/2024\/12\/23\/hypothesis-around-the-pricing-of-agentic-ai\/","title":{"rendered":"Hypothesis Around the Pricing of Agentic AI"},"content":{"rendered":"<p>One of the buzziest buzzwords of 2024 has been <strong>agentic AI.<\/strong> The concept of an AI agent can be nebulous: sometimes a human is in the loop along the way, sometimes not. Sometime the input will be based on an LLM, sometimes it will be smaller more tightly trained model.<\/p>\n<p>But generally, the philosophy of agentic AI moves away from \u201ca person prompting an AI assistant question-by-question\u201d into a world of systems and workflows. This <a href=\"https:\/\/simonwillison.net\/2024\/Dec\/20\/building-effective-agents\/#atom-everything\">post from Simon Willison<\/a> is well worth reading for those interested in the topic.<\/p>\n<p>The pricing shift from assistant-based to agent-based is going to be interesting to watch.<\/p>\n<p>Many of the AI assistant tools are priced at least partially on a per-seat basis. In a world of agentic AI, when the output of the system is not necessarily driven by human activity, I suspect usage-based pricing will reign. And in particular, I suspect output tokens will be the key driver.<\/p>\n<p>For existing models, cost\/million output tokens is already more expensive than cost\/million input tokens. (See <a href=\"https:\/\/simonwillison.net\/2024\/Dec\/4\/amazon-nova\/\">Simon Willison\u2019s comparison of model pricing update after the release of the Nova models<\/a> as a reference point of how large providers are pricing as of December 2024.) The price differences between providers and models is less important for this analysis than the noting the consistency at which the output tokens are 4-5x more expensive than the input tokens.<\/p>\n<figure id=\"attachment_878\" aria-describedby=\"caption-attachment-878\" style=\"width: 520px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/simonwillison.net\/2024\/Dec\/4\/amazon-nova\/\"><img decoding=\"async\" src=\"http:\/\/redmonk.com\/rstephens\/files\/2024\/12\/Model-Pricing-2024-520x181.png\" alt=\"Model pricing of GPT-4o Mini, Claude 3 Haiku, Claude 3.5 Haiku, Gemini 1.5 Flash-8B, Gemini 1.5 Flash, Nova Micro, Nova Lite\" width=100% class=\"size-medium wp-image-878\" srcset=\"https:\/\/redmonk.com\/rstephens\/files\/2024\/12\/Model-Pricing-2024-520x181.png 520w, https:\/\/redmonk.com\/rstephens\/files\/2024\/12\/Model-Pricing-2024-1024x357.png 1024w, https:\/\/redmonk.com\/rstephens\/files\/2024\/12\/Model-Pricing-2024-768x267.png 768w, https:\/\/redmonk.com\/rstephens\/files\/2024\/12\/Model-Pricing-2024-480x167.png 480w, https:\/\/redmonk.com\/rstephens\/files\/2024\/12\/Model-Pricing-2024.png 1114w\" sizes=\"(max-width: 520px) 100vw, 520px\" \/><\/a><figcaption id=\"caption-attachment-878\" class=\"wp-caption-text\">(All data courtesy of Simon Willison, all I did was division)<\/figcaption><\/figure>\n<p>This explanation about the <a href=\"https:\/\/www.greaterwrong.com\/posts\/g7H2sSGHAeYxCHzrz\/how-much-ai-inference-can-we-do#comment-RXnfe2ojyqmhLTXJm\">differences in input vs. output memory usage<\/a> is the best rationale I have found about why output tokens are priced 4-5x higher than input tokens. That said, this is a fast moving field and this comment is from May 2024. If anyone has other explanations I&#8217;d love to hear them in the comments.<\/p>\n<p>The technical reason why is input and output pricing varies is interesting, but the in the end the pricing is it&#8217;s own reality, at least for the moment. As such:<\/p>\n<p>Workflow-based systems rather than user-based systems combined with output-driven resource usage drives my hypothesis: output tokens are going to be the primary price lever of agentic AI systems.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of the buzziest buzzwords of 2024 has been agentic AI. The concept of an AI agent can be nebulous: sometimes a human is in the loop along the way, sometimes not. Sometime the input will be based on an LLM, sometimes it will be smaller more tightly trained model. But generally, the philosophy of<\/p>\n","protected":false},"author":45,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":"","jetpack_publicize_message":"Hypothesis Around the Pricing of Agentic AI","jetpack_is_tweetstorm":false},"categories":[40,4],"tags":[],"class_list":["post-877","post","type-post","status-publish","format-standard","hentry","category-ai","category-finance"],"jetpack_featured_media_url":"","jetpack_publicize_connections":[],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/redmonk.com\/rstephens\/wp-json\/wp\/v2\/posts\/877","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/redmonk.com\/rstephens\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/redmonk.com\/rstephens\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/redmonk.com\/rstephens\/wp-json\/wp\/v2\/users\/45"}],"replies":[{"embeddable":true,"href":"https:\/\/redmonk.com\/rstephens\/wp-json\/wp\/v2\/comments?post=877"}],"version-history":[{"count":0,"href":"https:\/\/redmonk.com\/rstephens\/wp-json\/wp\/v2\/posts\/877\/revisions"}],"wp:attachment":[{"href":"https:\/\/redmonk.com\/rstephens\/wp-json\/wp\/v2\/media?parent=877"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/redmonk.com\/rstephens\/wp-json\/wp\/v2\/categories?post=877"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/redmonk.com\/rstephens\/wp-json\/wp\/v2\/tags?post=877"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}