console.log()

Why Message Queues Endure: A History

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

As I grapple with the question of why message queues endure in an era of rabidly-hyped alternatives like streaming, it is worthwhile to look back at messaging’s past in order to better understand the present. When I began researching message queues I struggled to assemble a coherent history, which this post begins to address. I have already published an opening salvo for this series which establishes message-queues as “a dominant paradigm in distributed systems, [which] are essential, well understood, and therefore increasingly ho-hum.” In this follow-up, I now shift my attention to the question of how we got to this point of queueing’s dominance, and whether it will continue in the Kafka era, paying special attention to the question of what major technology players and industry verticals have been responsible for heralding the message queue’s rise.

I recently discussed the history of message queues with two legends in the messaging space and long-time friends of RedMonk: Andy Stanford-Clark, distinguished engineer and innovation leader in IBM research, and Clemens Vasters, principal architect focused on messaging and real-time intelligence services at Microsoft. Our conversation is worth listening to in its entirety, but I wanted to pull this quote from Stanford-Clark to kick-off my own post on a history of queue-based messaging because I think it gets to the question of why software engineers developed message queues in the first place:

So I think the key thing to focus on is asynchronous messaging. … Obviously people have been wiring computers together and wiring applications together for a long time. But it was always in a synchronous way. So in other words, one application had to be ready to receive the message before it could start sending it. They had to receive the whole message and acknowledge it before either could carry on. And we used to compare that to the wings … of a jumbo jet when it was going down the runway to take off, the wings actually flap slightly … because if they were rigid they’d snap off, so they have to have that flex in them, and by putting a queue or a springy connection in between two applications you stop them being rigid and then one of them snapping off, and the whole thing breaking, you allow them a little bit of flex. This one’s not quite finished with the previous job yet. Message goes into the queue. This one can carry on with what it was doing. This one will pick it up when it’s ready. And it just, that give-and-take in the system allows the whole thing to carry on without it all having to be in strict synchronous lockstep.

And it turns out that was the only real way to proceed with the complex situations, applications, people, sensors, all the things that were in an enterprise. So this idea of queue-based messaging came to pass and IBM had MQ series which later became WebSphere MQ which was literally point-to-point queue based messaging so you put a message into a queue and at that point application A could consider it actually delivered to application B because technically it was because we had this assured once and once only delivery so once you put it into the queue that was it you could you could safely forget about it there’d be no retries there’d be no comeback it’s gone, it would at some point be delivered to application B. And that was a very powerful concept.

It is in large part because message queues are asynchronous that they meet demands for flexibility posed by modern distributed systems. Fair enough. But before digging into a more nuanced (if still necessarily incomplete) history of how we arrived at this paradigm, it is useful to draw clear lines between message queues and messaging from other patterns. If you are already well-versed in the finer technical distinctions of these domains feel free to skip this section.

 

Definitions of Messaging Types

In computer science, there are four core patterns in messaging:

  1. 1-1 and request reply/ request-response: a (typically) synchronous communication method where one system sends a request and another system replies with the requested data. This two-way message exchange pattern is similar to a phone call, requiring the sender to wait for a response, and is widely used in client-server architectures such as HTTP calls. “Sync over async” or “sync/async” refers to the communication between a synchronous and an asynchronous system, which is often used in implementations of enterprise application integration (EAI) where responses require slow aggregate functions or human workflows.
  2. 1-many Pub/Sub: the Publish–Subscribe (Pub/Sub) pattern, sometimes called push technology, is closely related to the message queue model and is usually a component of a broader message-oriented middleware (MOM) system. It allows publishers to categorize messages into classes that subscribers receive. Unlike direct messaging, subscribers receive only the messages they are interested in without knowing which publishers are involved. This pattern, commonly used in message-oriented middleware systems like Java Message Service (JMS), offers better scalability but can introduce complexities in terms of data structure and format coupling.
  3. Data stream: transmits information through a sequence of digitally encoded signals, usually organized into packets. This paradigm has become extremely popular in recent years because by streamlining the flow of data for processing it removes the time and resource overhead incurred by extracting data from a database. In the post-pandemic era, data streaming has seen a boom because it enables the real-time collaboration necessary for successful remote and distributed teams.
  4. Message queue: A message queue enables asynchronous communication between services in serverless and microservices architectures. It temporarily holds messages until they are processed and removed. Message queues help separate complex processing, handle buffering or batching, and manage fluctuating workloads more efficiently. The message queue model is closely related to the Pub/Sub pattern and often forms part of a broader message-oriented middleware system, with many systems, like JMS, supporting both models in their APIs.

 

NASA, Banks, and Airlines

All right, we now have a high-level explanation of what message queues accomplish and some definitions to delineate major messaging patterns, so we’re now prepared to move on to a discussion of the historical progression of this technology.

Stanford-Clark is right to point to IBM as pioneers of the message queue, with the earliest example being IBM’s Queued Telecommunications Access Method (QTAM), which was announced in 1965 (see also the 1969 ACM paper by IBM’s Robert M. Winick entitled “QTAM-control and processing in a telecommunication environment”). QTAM, which was superseded by the Telecommunications Access Method (TCAM) and then the Virtual Telecommunications Access Method (VTAM), was developed for IBM System/360 (S/360) mainframe as a built-in communications access method (see header image above). NASA’s Apollo program is in large part responsible for funding S/360, meaning that the US space program played a not insignificant part in advancing an early queue-based telecommunications subsystem. But of all the industries that contributed to the domain of message-oriented middleware, banks and airlines had an outsized role in the ascendance of asynchronous messaging, and the paradigm of guaranteed transactional messaging systems in particular.

Banks and airlines uniquely require the core functionality of guaranteed message delivery, typically enforced by means of a two-phase commit protocol, in order to ensure atomicity because disaster results from waylaid, corrupt, duplicate, and otherwise compromised financial and reservation transactions. IBM pioneered an enduring version of this technology with their Transaction Processing Facility (TPF) mainframe operating system, which runs the TPF Passenger Reservation Application (PARS, or IPARS for the international version). The enterprise airline messaging software TPF, PARS, and IPARS were all developed in the 60s and 70s, they are proprietary to IBM, and these messaging applications, and especially IBM z/TPF, continue to be actively used by enterprise customers.

Another, perhaps more established, use case for guaranteed messaging systems is that of financial institutions. Brick and mortar banks and exchanges have long operated in a distributed way spread out across large geographic regions. They need to communicate and integrate in a coordinated manner across institutions, all while needing to stay up to date everyplace simultaneously. A number of networking technologies have facilitated this form of communication over the decades and centuries. Examples are rife, but two prominent instances include the telegraphic printing ticker tape technology, which played such a visible role in the mythology of the 1920s stock market crash, as well as the fiber-optic lines used for high-frequency trading chronicled in Michael Lewis’s book Flash Boys. Shuttling information from place-to-place quickly and accurately is a legacy of the financial industry that has had consequences for networking systems in every business vertical.

Image credit: “Finis” Life. 1929.

The computerized message systems that banks use today need to ensure without any doubt that a message broker has received and processed all data that goes out and comes in. Although the technical minutiae of how this is accomplished lies outside the scope of this post, measures adopted at a business-level are notable. The financial industry came together to adopt open standards that sit on top of messaging to ensure performance that include the Society for Worldwide Interbank Financial Telecommunication (SWIFT), the Financial Information eXchange (FIX) protocol, and the Financial products Markup Language (FpML). In addition to establishing business protocols and formats for messages, through the decades a number of software vendors have also stepped up to ensure these institutions’ messaging processes run without a hitch. Significantly, in the 90s, many banks were using IBM MQ for integration and TIBCO Rendezvous (RV) for Pub/Sub, so let’s pause on these players.

IBM’s MQSeries, later renamed WebSphere MQ (2002) and then IBM MQ (2014), was first introduced in 1993 to facilitate secure point-to-point and Pub/Sub messaging for distributed systems. As Brian Wilson, Americas automation technical sales leader for integration at IBM, elaborates in a 2023 post:

IBM MQ is used across all industries, though arguably may have the most critical use cases in the banking and financial services markets… IBM MQ is used by an unbelievable 98 of the top global 100 banks. You will find IBM MQ in use at 85% of the fortune 100, and is used heavily in transportation, logistics, retail, insurance, telecommunication, healthcare, and more. For one major airline, for example, airplanes do not move, period, if IBM MQ is down.

IBM MQ continues to dominate the enterprise Pub/Sub messaging space, but 25 years ago it had major competition from up-and-comers, TIBCO. Tugrul Firatli, Vivek Ranadive, and Vijay Tella founded TIBCO (The Information Bus Company), a real-time message service and early pioneer of Pub/Sub, in 1997. Ranadivé created TIBCO while at Teknekron Software Systems, part of the Teknekron Corporation business incubator, prior to its being acquired by Reuters in 1993. TIBCO was instrumental to the financial sector in the late 90s, particularly in automating Wall Street’s trading floors.

Since the 2000s, TIBCO has transitioned from a message broker to an enterprise application integration (EAI) in order to meet customer demand to accomplish necessary business tasks such as, for instance, integrating an Oracle backend with a retail banking application without compromising any data. TIBCO RV utilizes a subject-based messaging model for EAI in which systems subscribe to specific RV subjects by topic, with the Rendezvous Daemon (RVD) responsible for receiving messages from those subscribers. Interestingly, Derek Collison, founder & CEO at Synadia, derived many of the ideas that inspired NATS, an open source MOM, from his experience working on TIBCO RV (see “What does ‘NATS’ stand for“) and later Cloud Foundry at Pivotal/VMware.

The rise of EAI is an important touchpoint in the history of message queues as their popularity has continued into the present. Today, consumers can select from among several EAI platforms that include APPSeCONNECT, Azure Integration Services, Boomi, IBM App Connect Enterprise, Informatica, Workato, and Oracle Cloud Infrastructure Integration Services.

 

Open Source & Open Protocols

Which brings me to the subject of openness in messaging, which Redis’s recent relicense and the Valkey fork has certainly made top of mind for us here at RedMonk. The promise of open source message queue systems and open protocols have excited developers and IT operations and administrators from early days. The complexity of the messaging problem makes the success of any system uniquely dependent upon ecosystems, integrations, and standardization. One particularly successful open standard is the Advanced Message Queuing Protocol (AMQP), which gained popularity following its introduction in the mid-2000s. John O’Hara initially developed AMQP at JPMorgan Chase in London in 2003, and a number of financial institutions joined the working group to advance this protocol. Pieter Hintjens, who authored over 30 protocols during his lifetime, also contributed to the design of AMQP before going on to found the asynchronous messaging library ZeroMQ.

RabbitMQ, a popular instance of AMQP, was developed by Rabbit Technologies (later acquired by VMware) in 2007, and is responsible for generating significant adoption of this protocol. While VMware’s role in fostering messaging is probably worth a stand-alone history, suffice it to say that folks like Alexis Richardson, founder & CEO of Rabbit Technologies and later Senior Director at VMware, and the entire Spring team have been fundamental for advancing queueing.

Let’s pause on RabbitMQ, as it offers a compelling argument for the needs and challenges around openness when it comes to messaging. To this day RabbitMQ is absolutely everywhere because it works well and it’s open source. RabbitMQ was used inside Linux Debian, and then became a fundamental backbone of NASA’s computing platform Nebula, one of the two founding projects comprising OpenStack. It has been used by Heroku and Cloud Foundry (developed by VMware). Much of RabbitMQ’s popularity can be attributed to the strength of its community. The RabbitMQ team launched a successful developer marketing campaign that has paid dividends in terms of adoption. In fact, developer enthusiasm is often credited with precipitating RabbitMQ’s acquisition in 2010.

In the past twenty five years messaging has moved to the edge so that the same fail-proof functionality that banks demanded in the 90s is now required everywhere from oil pipelines to pacemakers. In 1999, Stanford-Clark developed a lightweight, open message queuing protocol called MQTT for just this use case (Happy 25th Birthday MQTT!). The “MQ” in “MQTT” came from the IBM’s MQSeries, renamed IBM MQ. While IBM MQ is widely used in industrial automation, MQTT is tailor made for remote IoT. HiveMQ, which sells a managed MQTT platform, leverages this use case in their own marketing targeted to industrial, transportation, and automotive customers.

The history of open messaging protocols, exemplified by AMQP and MQTT, reveals how openness, interoperability, community support, and developer enthusiasm have driven widespread adoption of queueing across diverse industries—from finance to IoT—that continues into the present. Their success, which is built upon the engineering community’s evolving needs and preferences, sets the stage for the next crucial page in the story of queueing in reliable, scalable systems: databases.

 

Databases like Queues

If the guaranteed messaging demanded by banks and the airlines sounds a lot like ACID, it should. Databases are not only increasingly adopting MQ type features, the properties of atomicity, consistency, isolation, and durability are shared by these domains. Databases can write, delete, update, and move records. Similarly in messaging, an application needs to be able to write, delete, update, and move a message. We’ve seen that Redis is becoming a popular message queue because of this overlap. FoundationDB also foregrounds ACID-compliance as core to this distributed database—a capability that demands bug-free operation, which Anthesis has pivoted to the domains of QA and testing. If we think about databases in terms of cap theorem, the concerns of consistency, availability and partition tolerance map directly onto messaging and the complex problem that MOM was designed to handle.

I underscore databases that act like message queues because an increasingly vocal subset of distributed systems engineers complain that not every system requires the complexity or scalability that comes with message queues, and they are seeking simpler alternatives. Microservice-based implementations, which began to dominate the DevOps space with the introduction of Kubernetes in 2014, were once tightly associated with message queues because they generally require decoupling and asynchronous communication between services. The initial hype around microservices led many companies to adopt message broker systems like Kafka, NATS, and AMQP as the backbone for inter-service communication. Brokers are useful to facilitate messaging in microservices to ensure fault tolerance because microservices, particularly in environments like Kubernetes, are designed to fail regularly. Distributed systems practitioners recognize that monolithic architectures or smaller, tightly coupled systems may be more appropriate solutions, while also avoiding both the overhead (asynchronous processing, error handling, and retry logic) of managing distributed systems and the expense of paying a third party to manage these for them. As one Hacker News user comments:

My bar for “Is there a reason we can’t just do this all in Postgres?” is much, much higher than it was a decade ago.

These engineers argue that instead of message queues, many circumstances the KISS principle favors direct HTTP calls, database-backed queues, Redis for in-memory message handling, and distributed logging.

Logging has become a particularly popular paradigm within the remit of databases that act like queues. Twitter’s distributed key-value database, Manhattan, for instance, utilizes distributed logs to provide a durable, ordered record of events that can be replayed and processed in different ways. Another popular option is the Apache BookKeeper project, which in 2017 merged with DistributedLog (DL). Engineers at Yahoo developed BookKeeper as a high-availability solution to Hadoop’s HDFS NameNodeone. These services are intended to maintain sequences of records (called logs or log streams). Logs are well-suited to applications where event streams need to be processed in real-time, which makes it perhaps unsurprising that the unchallenged giant of streaming services, Apache Kafka, also stores messages in immutable, distributed logs called topics.

 

Queues in the Kafka Era

The 2010s saw the rise of services possessing database-like properties (durable, consistent, can retain data indefinitely) that do messaging and call it streaming, with Apache Kafka being the most prominent. Kafka, with its “distributed commit log” design (4), is a Pub/Sub messaging system developed by Jay Kreps, Neha Narkhede, and Jun Rao while at LinkedIn, and open sourced as part of the Apache Incubator program in early 2011. Kafka is truly the elephant of the streaming space. It is leveraged by a number of companies such as Confluent (founded by Kreps, Narkhede, and Rao in 2014), Redpanda, and WarpStream (acquired by Confluent this year). In fact, those tracking the messaging space would probably argue we are in the Kafka era—not so much because other types of messaging have become less popular, far from it, but instead because there is money to be made with streaming. More on this in a later post.

Debates around whether Kafka fits the definition of a database have become the equivalent of asking software engineers how many microservices can stand on the head of a pin. Practitioners generally agree that for the majority of use cases using Kafka (vanilla or managed) as a database is unwise. More interesting than this academic question is how messaging has evolved to accommodate what Neha Narkhede, Gwen Shapira, and Todd Palino’s characterize in Kafka: The Definitive Guide as the need the accommodate “the continuous flow of data”:

Our idea was that instead of focusing on holding piles of data like our relational databases, key-value stores, search indexes, or caches, we would focus on treating data as a continually evolving and ever-growing stream, and build a data system—and indeed a data architecture—oriented around that idea. (xiii)

The use cases for streaming services, which include ride sharing apps like Uber and video-on-demand services like Netflix, are well recognized today, particularly for those of us who benefitted from these services transforming our day-to-day lives (shakes fist at cloud shouting “I remember when Netflix arrived as CDs in the mail!”). According to Alexis Richardson “there’s a lot of contention about whether streaming is a separate use case, and today it is becoming table-stakes with RabbitMQ and NATS supporting streaming.” Less controversial is the idea that services for streaming to our devices and real time data processing are growing by leaps and bounds. The rise of streaming data coincides with the move away from messaging guarantees. Businesses have realized there are a whole set of problems where atomicity is less important than low-latency. It is impossible and unnecessary to transactionally guarantee the huge volumes of data Netflix shuttles from their servers to my television pixel-by-pixel. While banks have created the FAST protocol to adapt FIX for STreaming, this is only one use case among many. In fact, the WebSocket protocol, which appeared in 2011, is showing some momentum in the byte streaming space owing to its bidirectionality, particularly for real-time video games and feeds.

 

 

Developers are sometimes confused about whether Kafka supports queueing, such as this Redditor who asks “Is it appropriate to use Kafka as a message queue?” Like the decision to use Kafka as a database, the consensus among practitioners is that it depends on individual use cases. Some seasoned distributed systems engineers argue it is not a good solution at scale. These Kafka-as-MQ skeptics recommend implementing alternative messaging systems like Apache Pulsar or RabbitMQ, while pointing as evidence for their bias to the “Queues for Kafka” Kafka Improvement Proposal (KIP-932) authored by Andrew Schofield, software engineer at Confluent and expert in messaging, event-streaming, and transaction processing technologies. These nay-sayers correctly note that, unlike queues, with Kafka:

The way that consumer groups assign partitions to members of the group gives a powerful combination of ordering and scalability, but it does introduce coupling between the number of consumers in a consumer group and the number of partitions. Users of Kafka often have to “over-partition” simply to ensure they can have sufficient parallel consumption to cope with peak loads.

If implemented, “Queues for Kafka” bypasses the need for a queue by using Kafka topics to introduce queue-like behavior through cooperative consumption. But does KIP-932 signify a disjunction at a protocol level between queueing and Kafka? Considering the KIP’s suggestive title, I was eager to learn more about Kafka’s relationship to queuing from Schofield, who admits to being “a little bit cheeky” in naming this KIP, but ultimately explains:

Kafka is a log, and what we’re effectively doing is providing an alternative access pattern over the top of a Kafka topic, which behaves much more like a queue. And the reason is essentially that, prior to this KIP, people writing applications consuming from Kafka would use consumer groups. And consumer groups have an opinion about how many consumers you can run and how many partitions there are because you get strong assignment between each consumer and one more partition. So it’s very good for high scale event streaming, where you want applications to pull records very, very fast and process them all. But nowadays, many people treat Kafka as a more general-purpose messaging technology. With this KIP, we’re going to provide a less opinionated way for applications to consume, which is more message by message, rather than give me the whole partition, right? It just makes it easier to share, essentially. So yeah, it’s not, it’s not really a queue, but it lets you write queuing style applications, and that’s the point.

The line separating queues from queue-like implementations is fuzzy, but there is a strong sentiment among engineers that both strict queues and those on the margins all have value. The Kafka-era has tremendous momentum, and event streaming has seen adoption from a growing number of enterprise customers, but despite the hype queues will continue to endure.

 

Conclusion

Message queues have come a long way, baby, and they aren’t going anywhere soon. When I asked Stanford-Clark about the continuing relevance of MQTT he was optimistic:

we’re raising a glass to the next 25 years of MQTT because its popularity shows no sign of slowing down. In fact, in many ways, nature of an exponential curve.

Queues remain a valuable tool for solving many distributed systems’ problems and any perceived decline in the popularity of queueing is not an indictment, but rather a reflection of this paradigm’s ascendence to a reliable, mature technology. While streaming’s distributed logs and event-driven architectures have gained prominence as a viable alternative, in many ways, the reduced hype around message queues is a sign of their success in quietly underpinning some of the largest systems in the world.

Disclaimer: IBM, Google, Microsoft, VMware, Synadia, and Oracle are all RedMonk clients.

Acknowledgements: I want to thank James Governor, Alexis Richardson, and Clemens Vasters  for reviewing and providing feedback on earlier drafts of all the posts in this series.

Header Image Credit: “IBM System/360 Model 91 operator’s console at NASA, sometime in the late 1960s.” Wikipedia.

One comment

  1. Great article Kate! Thank you for your research.

Leave a Reply

Your email address will not be published. Required fields are marked *