Skip to content

Hark Episode 2, “The Software Paradox”: Guest, Kent Beck

As newsletter subscribers are aware, Episode 2 of my new podcast Hark, “The Software Paradox,” dropped last week. In this month’s episode, Kent Beck – yes, that Kent Beck – dropped by to discuss The Software Paradox. What does someone responsible for some very popular software think about the thesis that software’s up front commercial value is headed in the opposite direction from its strategic importance? What were his experiences trying to monetize data telemetry with JUnit Max? We covered that and more – I can’t speak for Kent, but from my end it was a really conversation.

For those of you who don’t do podcasts, however, we offer a transcription of the episode below. Enjoy.

Steve: A little while back I wrote a book called The Software Paradox. Because the book claims, among other things, that software markets that were performing well at the time would be performing less and less well overtime, some people were unsurprisingly less than thrilled, which is fine of course, you’re not going to win too many popularity contests as an analyst. It’s certainly not why you get into this business. What I was always curious about however was what authors of popular software thought about the idea, the core idea that software would be less valuable over time, at least from the commercial standpoint. I talked to a great many developers over the course of writing the book and certainly over the course of our day to day jobs, but still, there was always that curiosity, there was always that question as to what they thought around the idea. Could they challenge the perspective or even refute the base thesis of The Software Paradox?

Enter Kent Beck. Many of you may know Kent for JUnit, maybe you know him for Extreme Programming, or maybe as a signer of the Agile Manifesto most recently from his time at Facebook. However you know him, however, it is very likely that you do know him because Kent has had a very successful career as a software developer. He was kind enough to join the show for discussion around his thoughts and feelings on The Software Paradox both as a software developer and the author of some very, very popular software projects. Welcome to part episode two, The Software Paradox. So excellent. Welcome to the show, Kent. I wanted to start with a quick background. So as we’d like to do when we open the show, can you tell me who you are and what you do?

Kent: My name is Kent Beck. I am currently employed as a technical coach at Facebook. I run a variety of education programs for younger engineers and at this point, they’re almost all younger engineers. Before that, I was a independent consultant, probably best known for Extreme Programming, test-driven development, the use of patterns in software development, the JUnit, the family of testing frameworks that was something that I developed with Erich Gamma, and am I leaving anything out, Steve?

Steve: That’s the background as I know it, I’m sure. I’m sure as it is true with all of us, I’m sure there are things that are going to fall by the wayside but no, I think that sums it up pretty well.

Kent: Okay.

Steve: So to get the conversational ball rolling so to speak, this little podcast started out of sort of an interchange where you were kind enough to point people at a book I wrote called The Software Paradox, and just for the background of people who are unfamiliar with that concept, this is how I would sum it up and we’ll see if you agree or disagree, Kent. So Software Paradox from my perspective is essentially the idea that the value of traditional on premise, it was sort of paid upfront commercial software is in decline, and is in decline broadly, across a wide sector of categories, consumer to enterprise. But this is occurring even as a strategic importance of software is actually going up by the day. Is that your understanding? Does that definition work for you?

Kent: So I probably would use slightly different words. The value is increasing, the value of the software is increasing, but the revenue pool available for people who create that software is shrinking.

Steve: Yeah, I’d say that’s fair. So what is it about that idea that resonated with you? In other words, you had a response and you were, as I said, good enough to point people over to it. So what was it about that idea that sort of piqued your interest?

Kent: Well it’s just so backwards. In any other case, it defies experience. So if something becomes more valuable, its price goes up. It sends a signal out. Yeah, I’m not an economist, but I’m a, what would you call me?

Steve: You know enough to be dangerous.

Kent: I do know enough to be dangerous. Thank you. Yes, yes, that’s good enough. So prices rise to send out signals to make more of this stuff that’s valuable. Except in software world, sometimes more of it makes it less valuable like when you got too many integrated development environments out there. You like to have the cliché more wood bind, fewer arrows [SP], except then every once in a while everything turns topsy turvy and you get these old incumbents that have slowed down and then you want Kuhn’s Extraordinary Science, you want fulfillment [SP] and lots of little things growing up. But in general, the way the market works is if something’s valuable, it raises prices so that there’s more of it. And here’s the case where that’s just the opposite. So if you can’t point to examples, I don’t know if you have other examples in world history where this kind of inversion took place, I guess that’s my first question.

Steve: That’s a great question actually. I don’t know, I’m trying to think if I came across any sort of in the background because it’s one of those things people always ask us how do things like The Software Paradox or before that, things like The New Kingmakers come up. And really they’re born out of conversations that we have, lots of them, right? So we obviously as sort of just by function of being an analyst, you have a lot of conversations with a lot of different people and you begin to see patterns, and then you begin to see sort of trends over time, and The Software Paradox was essentially that’s exactly what happened. I kept talking to businesses that were struggling to monetize and this is true again across a variety of sectors in my experience. And yet at the same time, you have essays like Mark Andreessen’s Software’s Eating the World, which is largely true I think at least from my perspective, that suggest that a lot of companies even in traditional industries are going to be increasingly defined by software, which means the software’s playing sort of this more important role. But it’s a great question, I don’t know that there’s another historical example that I can point to.

Kent: Okay, I assume this is a thinking on our feet kind of conversation?

Steve: Oh, very much so.

Kent: Okay. So what came to my mind was the Model T, which as it became more valuable because there were more roads and more gas stations, having a car became more valuable but the price dropped. But at the same time, I think the revenue pool didn’t shrink. I think the revenue pool expanded dramatically.

Steve: Yeah, I think there’s a lot of examples of products that would basically make a transition from what I would call sort of a margin opportunity to a volume opportunity, right? That’s something that we see sort of over and over and over again. I think there’s a couple things that are different I think in this case and that you pointed out one of them, right, which is that I think that the total revenue pool is in many cases actually shrinking. There’s fundamentally less dollars to go around, which is again a change because in the case of the Model T or any other number of examples we could point to from history, the actual size of those industries grew dramatically as they transition from margin to volume. The other interesting twist that software adds at least in my view, is that it takes the floor out, right? So in other words, when you produce something like a Model T, there’s a certain cost of goods, certain cost of manufacturing that goes into that. So in other words, whatever that cost of the materials is to put it together and whatever the cost of labor is, there’s a reasonable floor, right, that goes into putting that together, which is not to say that software is sort of inherently without some of those costs, but obviously the cost of distribution is effectively zero. The cost of replication is zero.

So in many cases, it takes out sort of the bottom of that market. So for the cost of a physical good, it may go down to a near [SP] cost. In many cases, what we see in software is that it doesn’t go down to near cost, it goes down to zero. And that’s a big change, and that’s a big issue for many that the vendors from a software standpoint to grapple with because it’s one thing if you sell near cost, if you have to sell for zero dollars, what is your business? What is your market?

Kent: Right. And there’s this argument about well, the price drops to the cost of replication, which is essentially zero. But there’s also the amortized cost of development that needs to go in there. And so if I’m a…here I am a 55 year old man, if I think, “Oh, let me start this new open-source project,” how is that ever going to contribute to my retirement? Because if I took say 6 months out of my life, which I did about 10 years ago now. I built a JUnit add on called the JUnit Max, and I couldn’t figure out how to turn it into a business. Now partly that’s because I’m a lousy businessperson. I’ve read too many books and I have too little talent or something like that.

Steve: Yeah, I know the feeling.

Kent: But it’s also because like, how am I going to get paid for that six months? And eventually I just decided there’s not a way to get paid for that time, so that leads me to not build tools that might be really useful. That’s another part of the paradox is that the signals the market sends by dropping the revenue available for software, it sends this signal that this stuff shouldn’t be created. And then we don’t create it and then the value available to everybody drops, and yet somehow it doesn’t go, it doesn’t go to zero. Facebook, for example, spends a lot of investment dollars on open-source software for which we receive zero revenue, but we got React and React Native and all the changes we made to MM [SP] Cash and there’s just a huge laundry list of things which makes sense for Facebook to invest in, but where does the…I don’t see where the engine is that starts the next thing.

Steve: Yeah, and I think that’s sort of an interesting segue to one of the questions that I wanted to sort of get at on this which is, do the underlying economics, and I’m a big believer that economics are one of the most powerful if not the most powerful change agents, do the economics here not necessarily stop the flow of open-source software or other software in general, but does it change the nature of the creation? And to get to that point is kind of…your example I think is perfect, right? Because on an individual level, it’s very difficult for a lot of developers, I think, to justify the effort, the resources that go into producing a given piece of software if there’s not a clear financial return for them. Now in many cases, we can all think of exceptions I’m sure, where it’s, “I’m going to create this project because I want to get hired by this other company. I’m going to create this project because it solves the problem that I have,” or what have you, but one of the key used cases of course historically has been, “I’m going to create a piece of software because I want to make a living and I want to make money off that.” And The Software Paradox would suggest that on an individual level, that’s sort of difficult to maintain. And yet, as you know, we have large entities, Facebook certainly is one of them, Google is another, Apple surprisingly is now one with Swift, and so on.

We have all these large entities that are now producing software and they are essentially releasing it for free, which again, from an economic perspective, suggest that they’ve looked at it and essentially determined that they have a higher return from releasing it than they would from selling it, which makes sense because none of those companies are in the business of selling infrastructure software at least. So I guess from my perspective, I guess the question I’m curious about from your end, having been an individual software developer and working for Facebook now, do you expect or anticipate a shift in terms of where the software’s coming from and who it’s produced by?

Kent: My next question is for whom does it make sense? It doesn’t have to make economic sense, but it has to not be fatal. I had kids in college and a mortgage to pay and I had to get a payoff some place. So for whom does it make sense to get software project started? Well, I think young people with low net and kind of nothing to lose and nobody relying on them. For them it makes sense to start that snowball rolling downhill. Most of the snowballs are going to go two feet and then stop, but every once in a while, one of them is going to start an avalanche.

My metaphors are getting a little mixed up here, but so I think you’re going to…what I would predict from my understanding of the model is that the innovation you’re going to see, the beginnings of innovation is going to come from people with nothing to lose and lots to gain. The refinement of those innovations into something enterprisey, something that a company like Facebook can deploy on a 100,000 servers, that refinement is going to come from those bigger companies because there’s no way that Jane grad student is…knows how to prioritize some piece of infrastructure at Facebook’s scale. So Facebook’s happy to pay for it, but it isn’t going to be money that triggers the beginnings of all these new piece of infrastructure. Who I feel sorry for in this whole thing is a company that wants to sell infrastructure. You’re just getting squeezed hard.

Steve: Yeah, it’s hard because it’s…we work with a lot of software companies and we work with software companies that are huge, we work with software companies that are just a couple of people. And again, that was a large part of the impetus for creating the book in the first place was that in many cases, they are getting squeezed. They’re getting squeezed at the bottom by open-source software. So for many of the commercial products, there are free as in beer alternatives that are…will do the job credibly, and exist, at the very least, sort of a downward price pressure. And in many cases, you’re getting squeezed at the top end by businesses that have things that still matter in today’s economy, things like account control or they have the sort of you’re the CIO and so on. And it’s a difficult position to be in, there’s no doubt. And a lot of them frankly… so we have as an example, I think I mentioned this in the book, I can’t remember. One of the businesses that we’ve spoken with basically looked at this and said, “We don’t necessarily agree across the board with the idea of The Software Paradox.” And I discussed some exceptions with them, but essentially they looked at it and said, “For our business specifically, we see our long term revenue from an upfront licensing standpoint going to zero then we plan for that.” And that’s a conversation that you don’t have four or five years ago, right?

Kent: Right.

Steve: That’s a conversation that is something that’s new and it’s a conversation that frankly, again, is surprising because you go to industry conferences today that have nothing to do with technology and they’re all talking about technology because technology in so many industries is now the differentiator. So the fact that the strategic values headed straight up in the sort of realizable commercial return is in many cases cratering, it’s a hard thing for a lot of companies to deal with.

Kent: So I’m a musician too and a writer so I’m interested in the evolution of those markets. The situation is…the analogy is not perfect because if you’ve got lots of bands putting out music for free, it’s not like that the value of that music…I’ve only got so many hours and I’ve only got two years. The value of that music isn’t skyrocketing. The economic leverage, you can’t take that music and turn it into a ten billion dollar company instead of a one billion dollar company the way that you can with software. So I’ve been scowering those markets for quite a while and it didn’t occur to me that just now that the analogy’s not perfect. So even if I figured out how to make money as a musician or I figured out how to make money as a writer, it wouldn’t necessarily inform the software situation.

Steve: Well, yeah. I think that there are definitely parallels though, right, because I think one of the things that you see in a lot of markets is that the experience as a whole is subsidized by some portion of it. So what do I mean by that? In other words, if you can sell at a software business, if you can sell one product for a tremendously outsized return, right? So for example, prior to the introduction of open-source databases or open-source application servers, where businesses felt they had no other choice, businesses would spend lots and lots and lots of money on sort of enterprise class application servers or databases or what have you. And those returns and those outsized margins can fuel investments and other products, right? They can fuel essentially experimentation.

They basically can subsidize a lot of other areas of the business. And clearly, I think, from a writing standpoint, you see that at least in journalism, right, where that whole business was for years and years and years subsidized by classified ads, sort of if nothing else, and in many cases, a sale of print editions and so on, and as both of those have fallen off, journalism and writers in general are left searching for, “All right, what’s the economic model here?” And again, I would say the same thing is I think is true to some degree of music where if you go back, certainly when I was growing up, you didn’t have downloadable singles for 99 cents or 79 cents or 89 cents or whatever they might be, you have to go out and buy a record for 16 or 17 bucks. And that’s again, subsidizes a lot of other investments and a lot of other economic opportunities for artists theoretically. A lot of that money obviously went into the hands of the recording business owners, but in other words, everything disintegrates. I said this before, everything disintegrates in subsidy for something else. And the challenge in a lot of today’s markets is that the original subsidy is gone and we need to find a new one. And I think in the case of software, I have ideas certainly in terms of what those are. Data is one of them, services is another, but I think that a lot of those original subsidies, “Hey, I’m going to charge some outlandish fee over and over and over again, and have 90%, 100%, 110% margins.” Those days, they’re not gone, they’re certainly businesses that are still realizing those kinds of returns, but they’re increasingly few and far between. Yeah.

Kent: Yeah, it’s such a puzzle. I think that’s the thing. When I get into a puzzling situation, first I look for analogies and I don’t have an analogy. And then I look for principles, like economic principles in this case. And I can’t even find economic principles where this makes sense.

Steve: Well, and it’s a difficult thing to me. You see this all the time with software, right? Software is just a fundamentally different animal and it’s really difficult for people to grasp that and not to go down a whole rat hole, but the most obvious example of this recently is the FBI versus Apple case, right, where you saw a lot of people sort of making the argument that hey, this is no different than essentially the FBI having the ability to go and search somebody’s house. And you basically have to step back and say, “If you think that, you don’t understand software, because software is inherently scalable and inherently sort of theftable or stealable in a way that a house key is not.” In other words, it’s not practical. I can’t go out and search million houses in 10 minutes. I can do that theoretically with a phone and with some of these vulnerabilities because software is just a fundamentally different animal. And it’s true economically as well, right? We see this sort of over and over and over again where people are trying to apply the economic rules for physical goods to software and to digital goods and it just doesn’t work that way.

Kent: And it did for a long time. I think that’s the confusing thing. I was involved in a startup called Agitar and we had a good product, a product that 10 years earlier would have made a, I don’t know, would have turned into a billion dollar software company. And because it was based on license revenue, poof, just gone, it’s just not a viable model. So that model worked for quite a while and now it’s stopped working. So here’s why I wanted to talk with you is we could probably list four, five classes of people, new graduate with technical abilities, investors, MBA who wants to go into technology, aging programmers, one that strikes near and dear to my heart, kind of mid-career programmers who are being squeezed by this.

What can we say? That was the piece of a book that was missing for me, what can you say to them, to any of those or all of those classes of people? And I don’t have any answer because I don’t have a model, I can’t. Usually I have got some kind of model and then I can just if I project forward in time far enough, then I sound outrageous and visionary and then, but it’s just a trick because I project 20 years, another people are projecting 10 years, then they go, “Oh yeah, you’re saying crazy stuff.” But here I just don’t even have a model. So what do I say to my daughter who’s just getting started in a software career? What do I say to myself?

Steve: Yeah. Yeah, I think that there are a couple different things. And I think a lot of the answer to that question to me depends on time frame, all right? So for the, say short to medium term, I’ll say out to four or five years let’s say because I don’t want to, well, one of the things that we find at RedMonk is that we tend to predict things and then they end up happening five years later than we think. I think in this case, I think for the foreseeable future, you’ll definitely have commercial opportunities and a fair number of them within the sort of commercial infrastructure software space. And were those opportunities be the size or the nature of what they would have been 10 years ago, no. You won’t have as many of them necessarily. But the fact of the matter is that Microsoft for example, is still making tens of billions of dollars annually on each of their twin revenue engines in Office and Windows, and that’s not going to change. And there’re a lot of businesses that are in that same boat, where they may be in decline, they may not be seeing the growth that they would have a couple of years ago, but they’re still making money and businesses are still paying for software. And for conversing…

Kent: Sorry to interrupt. As long as you can cut cost faster than revenue drops, then you got a viable business. So there’s a market out there, there’s a skill, a market out there both business and technical for a software hospice. You know it’s going to die, it’s a matter of time. You can stretch it out, give it some quality of life for a few more years. So yeah, I think there’s a book to be written there and I’m not going to write it because it’s the wrong side of…

Steve: Well I was going to say, I don’t think I’ll write a book called The Software Hospice. I don’t think that’s going to win me many fans.

Kent: [inaudible 00:26:54] software.

Steve: Yeah, people are already upset enough about The Software Paradox. Software Hospice might push them off the ledge.

Kent: It’s a natural follow up.

Steve: Yeah, yeah, exactly. As a follow-on to some of these businesses that are in decline, I think that there’re absolutely opportunities and good opportunities to build smaller businesses around open-source products. And the way that you do that typically is by saying, “Hey look, hey there’s a software product. We know it better than you do Mr. and Mrs. business. Do you really want to manage this sort of on an ongoing basis? Do you want to patch it, do you want to keep it up to date?” All those kinds of things. The answer is probably not. So look, this is what we do. We can sell that to you.

To sweeten the pot, a lot of businesses are turning to sort of “open core”, right, which is you have the sort of core of a product, all of the core features are available for free, they’re an open-source and then you have some proprietary layer that sits on top of that, that you have to pay for, whether that’s management or some other feature that isn’t necessarily integral to the experience but is a valuable add on and something that a business might pay for. So you can definitely build those kinds of businesses. We see that over and over today.

Kent: So here’s my problem with the open core model, is a question of internal moral hazard. Your part of the business is motivated to add features to the core because they want more adoption in general. And part of the business is motivated to exclude features from the core in order to make the commercial opportunity more attractive. I’m not saying it’s not navigable, but I think that there’s kind of this…

Steve: There’s a constant…

Kent: …seeds of a destruction.

Steve: There is a constant tension, there’s no doubt. Absolutely no doubt. Now, the way around that from my end, as you say, you can navigate that in the short term, right? In other words, it’s never a comfortable discussion. We’ve worked with many, many clients on just that subject in terms of, “Look, you can’t close that feature. You’re going to get killed.” So yes, it’s absolutely attention. It can be navigated in the short term. To me, the best solution to that particular problem over time is to go the services route, because all of the sudden from a services standpoint, one of the things that tends to happen is that all of your incentives begin to align with customers in ways that they don’t.

So as an example, if you have an open-source business and you’re selling support and service, well, theoretically if you do a great job of manufacturing a product, you’ve just put yourself out of business because why would I pay for support for products that works well basically all the time? Now of course, we all know that’s not how things work and there’s always bugs in software and so on, but there is a fundamental misalignment if you will, of needs, that the vendor needs customers to have problems, but the customer wants a product that has as few problems as possible, right? So that’s a fundamental misalignment. So what do you do? One of the things that you end up doing is that you offer these software assets as services. And all of a sudden, a lot of those question go away.

So for example, the tension between releasing features or keeping in private, gone. In other words, you want people to sign up, you want to retain customers, so what do you do? You continue to innovate and iterate on that product. So all of a sudden, that dries up and blows away. The concern for example of okay, what is the purchasing trigger, right? How do I get somebody to actually buy this software? Again, that goes away, right, because all of the sudden there are very few people who are looking for infrastructure software in a hosted setting for free, the way that they are for on premise, sort of open-source software.

Kent: I wouldn’t trust it if it was.

Steve: Exactly. In other words, even tiny services that we use, everything is paid, because otherwise where is that? You want the service to be around. It’s easier to justify paying for something where you know that they’re cost baked in versus software. A lot of people will sort of make the excuse to themself which is, “Well, look, they’ve already paid to develop the software and if I use it, there’s no additional cost to them to develop it again because the cost of replication is free and so on.” You don’t have that with services. There’s none of that sort of internal moral discussion.

So a lot of those issues go away if you go the services route. And the difficulty that we have in terms of talking to a lot of open-source businesses is that they’re aware of this, right? They’re aware of a lot of the issues that they have with The Software Paradox, they’re aware of some of the opportunities that exist in services business, but that is a much different business to start and run than is sort of the support and your traditional open-source important [SP] service business, right? All of the sudden, you need to hire a different caliber of person, you need to hire a different type of person because you’re not just developing a piece of software and sort of releasing it and testing it and so on, you need to keep infrastructure up and running 24 hours a day, 365 days a year. So how do you do that? Where do you that? Your capital costs are entirely different. Your cost of customer acquisition is all upfront, whereas the return from that is amortized over time. So it’s a difficult…as simple as it seems for a lot of analysts, people like myself on paper, that’s a hard question for a lot of businesses in terms of, “How do I get into that?”

Kent: Sure, it’s easy to put the spreadsheet together though.

Steve: Yeah, oh yeah. Yeah, we do it all the time.

Kent: For 20 years I’ve been on the board of a company in Switzerland called Lifeware that does life insurance contract management. So it was one of the earliest software as a service businesses. They basically run the whole backend of life insurance for medium-sized insurers. And because they’re really good software developers, their costs are a fraction of what a big life insurer pays for management a contract for a year. They only get paid by the contract, so their incentives are really closely aligned with their customers’ incentives. And it’s a tough business to run. As you say, you got all these upfront costs and then in the insurance business, you’re talking about a revenue stream that’s going to last 30, 40, 50 years. So you have to have access to capital, thank goodness, Switzerland, but you also have to have a lot of patience in a way that you don’t talk to a lot of recent MBA wannabe start up a software business and you’re talking about a 50 year timeline.

Steve: Yeah. And again, it’s just that things have changed, right? Because in other words, you go back 10 years ago, you can start with a relatively small piece of software. You can grow that into a sizable business quickly and you can either become sort of a huge entity in and of yourself, or you can sell out, you can exit for some large sum of money and the mechanics of those businesses have changed, right? And that’s something that a lot of the businesses that we talked to are struggling with.

Kent: Yeah. And with JUnit, I managed to completely miss that whole exit thing.

Steve: Yeah. So actually that was interesting. Talk to me about The Software Paradox lens as applied to JUnit. So what was your experience like, what would you have done differently if you had thought about it?

Kent: We couldn’t. We very explicitly had that conversation, Erich Gamma and I. “If we charge for this, no one’s going to use it. If we don’t charge for it, there’s no revenue. Duh, what are we going to do?” And we both had day jobs so we were the kids at the top of the mountain kicking a snowball down, because we were willing to just throw away that investment because we didn’t care if it paid off or not. Initially we had three hours on an airplane before the batteries ran out, flying from Zurich to Atlanta and so yeah, why not? And then this particular snowball rolled and rolled and picked up size and speed and turned out there was a lot of loose snow and so it ended up being very successful.

But very early on, we realized that if we tried to monetize it in any kind of way, that stops the snowball dead in its tracks, and it’s the end of the story. And we wanted people, we wanted programmers to have the benefits of automated testing, so we gave it to them. And I always get that argument, “Well, it’s going to pay off in other ways.” The last time I was doing consulting, my daily rate was half of what it had been 10 years previously, so I don’t see the payoff. And that’s part of the emotional trigger of your book for me is I faced this paradox in a very explicit way, and I didn’t get a result that, I don’t know, that made sense to me. I have a farm in southern Oregon and I have my goats and I live a really nice life. I still have to work for a living. So I did okay, but I’m not financially independent out of it. In the global world, I’m a ridiculous number of zeros, .001%, but at the same time, if a success at that scale had happened 10 years earlier, it would have been rational software, that was kind of the previous generation’s version of that which turned into a whole bunch of money for a bunch of people.

Steve: Yeah. Yeah, it’s interesting because the question of return is always an interesting one, right? So in other words, we just hired a new analyst. So I did, oh I don’t know, 25, 26 phone interviews and one of the questions that we got high percentage of time, we don’t get as much these days, but certainly we get it from time to time is, why do you release your content, your research for free? James and I had almost exactly the same conversation that you and Eric did, which was, I think it was year one actually which was, “All right, we have this research. If we charge for it, it’s going to be difficult,” because basically one of the biggest drivers for research isn’t the research itself, but somebody wants a name on it to say that, “Somebody told me to buy this so if it blows up, I’m not going to get fired.”

And if you’re a small firm, your name doesn’t count for anything in that analysis. But conversely, if we try to charge for it, nobody’s ever going to read it. And we made the decision at the time to say, “All right, you know what, this is the best economic decision for us is to release this as essentially open-source, if there was a term for that, certainly Creative Commons is the closest.” So release it under Create Commons license. And the return for us has been great because basically what ends up happening is it will put out a piece of research and businesses will say, “Hey, this is great, but how does this apply to my business?” Okay, awesome. That’s a consulting project, yeah.

So the return for us, it’s not always direct of course. It’s not like every piece of research produces something like that, but the attachment is close enough that it makes sense, right? It’s a justifiable economic decision for us. I think in a lot of cases for open-source, there are real questions as to whether or not it will be, because one of the things we haven’t talked about which is certainly a factor these days, is that there is so much open-source software that’s standing out and been sort of achieving that, “Hey, we’re going to become the snowball that starts an avalanche,” is increasingly difficult because standing out from the rest of the projects is more problematic than it was 5 or 10 years ago when there was just less software to compete with.

Kent: So I’m going to go ahead and disagree with you on that one, Steve. I think that it’s always been difficult to get attention. The success of software projects is always going to be distributed along some kind of power law distribution, which means that the vast majority of projects are going to get zero attention. And if you’ve never had a hit, if you’ve had a thousand zero attentions in a row, then that feels pretty unfair to you, but that’s actually a pretty small sampling. I’ve probably started 2, 3, 400 programs like JUnit, put in that amount of initial effort just because I was curious and I wanted to see. And that one’s by far the one that paid off the most. And I think given how much energy I’ve given to programming, I’m above, I’m exceeding variants. I got more than my share if I measure by people using and finding valuable the work that I’ve done.

Steve: Yeah, yeah. And to be clear, I don’t mean to imply that it was 5 or 10 years ago, hey, it was just simple, you throw something out there and it’s successful. I guess what I’m thinking of more is that take the database space in particular, right? If you go back 5 or 10 years ago, right, really what are we talking about? We’re talking about a small handful of projects, right? You’re talking about probably three from the commercial standpoint in terms of the most successful, Oracle, DB2 and SQL Server, and then you’re talking about two effectively from an open-source standpoint, MySQL and Postgres.

So that’s a relatively small sample size of projects. If you fast-forward to today, right, we have, depending on how you define the different categories, you probably have four or five different categories, right, in terms of relation [SP] obviously is still around, it’s still a big deal. You have graph, you have key value, you have sort of larger-scale data operations sort of in the [inaudible 00:42:37] type category and so on. And there’s probably two or three different more that we could list that are popular enough and that they’re sort of quasi-mainstream. And in every single one of those categories you now have, oh I don’t know, anywhere from say three to six legitimate contenders, legitimate sort of projects vying for attention. So it’s a crowded marketplace, right? There’s a lot going on and that’s a factor. When you start thinking about, “Okay, if I want to make my mark, where am I going to do that? Where am I going to sort of invest my time that hasn’t been done sort of over and over and over again, sort of done to death and I don’t have tons and tons of competition? The number of those areas is getting smaller.

Kent: Sure. If you were putting money in, expecting to get money out, you’d have to be nuts to start a competitor, to start, “Okay, I’m going to have another relational database and it’ll be the next MySQL,” because that’s just such a stupid bet. So I would say if you’re calculus is money in and money out, you’re going to have to wait, kind of seed funding the cost of planting the seed is so far below the transaction costs for a seed round that it just doesn’t make any sense to put money in to try and get money out.

Steve: Yeah. Well, I was just going to say, I think honestly for me, the biggest opportunity, the best return, I think, for a lot of businesses today honestly is going to come from data. And the difficulties is that it’s a very fraught conversation to have with vendors, right, because there’s all sorts of sensitivities in terms of, “Hey, if I tell customers that I’m collecting their data, they’re going to go nuts and not use my product and so on.” But here’s the thing, is that if you look at, if you talked to any of these customers who supposedly won’t use software that spies on them or watches any sort of what they do, the next logical question to ask them is, do you use any software as a service offerings? And the answer of course, in every case is yes, they use it somewhere for something. And in that case, then everything they do is being watched, right? Everything they do is being monitored, everything they do is…

Kent: Better be.

Steve: Exactly. You’re not doing your job if you’re a software as a service vendor. You’re not paying attention to things like usage patterns, right, or what are customers struggling with, what queries are taking longer and so on. So that to me is the…when I talk to businesses today and a lot of them are actually beginning to sort of take steps in this direction, the way out, and this goes back to sort of what you might tell your daughter or what I tell the startups that we speak with, is begin to think about not necessarily short term. You don’t have to pivot your business overnight, but begin to make preparations to create data as an asset, because if we look at your employer, Facebook, or if we look at businesses like a Google or a Twitter and so on, a lot of the value that they have at this point isn’t in the software, right, it’s in the data, it’s in the data that they generate. I use this example all the time. If you give me Google software from two years from now and you give me all the people and all the resources necessary to run that software, all the data centers and so on, just magically you grant that to me, it doesn’t matter because I don’t have the corpus of data to give you the returns that they can. In other words, the data that’s been built up overtime.

And obviously, it’s not a linear comparison, it’s not a one-to-one comparison comparing Google’s business to say your traditional infrastructure provider. But there are comparisons to be made because look, infrastructure software generates tons and tons of telemetry, we all know this. Anybody who’s used Spelunker, any other of the other sort of logging tools is aware of how much data it generates. That data has value and that data can be put to work.

Kent: Right, especially if you’re going to aggregate. That was the idea behind JUnit Max, this product that I worked on that we would log all the test run results from everybody, every program or everywhere in the world, and then you can answer questions like, do my test fail more often than other people’s? Or which languages seem to encourage better testing or worst testing and what can we find out about that? It is a hard flywheel to get started because you need a lot of data before it to starts to be valuable enough that you attract more data and you get the positive feedback going. But I agree with you.

I think if you’re looking for money in, money out that…so, okay. I live on a farm and this whole John Deere anti-hacking stuff, that’s a topic of conversation where I live, and what idiots John Deere is. If John Deere was, okay, so I’m just going to say it the way I would say it here in Oregon. If John Deere was really smart, they would let people do anything they want with their tractors as long as John Deere gets to collect the telemetry data from those tractors…

Steve: Totally agree.

Kent: …[inaudible 00:48:24]. And they could turn that around and they could run the most profitable commodity arbitrage business in the world. They could, oh god, there’s just a million ways they could sell that data. Anyway, they don’t have that vision.

Steve: No, no, they don’t. And that’s one of the things I really struggle with around The Software Paradox. The conversations we have with our clients is trying to get customers to have that kind of vision to see some of those opportunities, right? Because the difficulty is that when you have these conversations, it’s kind of an analogy I’ve used in the past, it’s kind of like trying to sell a relational database, right, in the sense that a relational database is a fantastic piece of software, it’s tremendously versatile, it’s behind sort of basically every application you touched on some level. But if you’re just trying to sell it to somebody in a vacuum, it’s difficult, right? Because okay, well, what can you do with a relational database?

Kent: It’s like selling an engine.

Steve: Yeah.

Kent: Which only works if you’re selling to a Formula One team.

Steve: Exactly.

Kent: If you’re selling to me, an engine isn’t doing me any good.

Steve: Yeah, I mean great. Okay, what can I do with that? Lots of things. Well, okay, what kinds of things? And then you have to basically try to find a way to scale that conversation, so it’s difficult, right? It’s a difficult conversation to have. Now the thing that’s helping is that you begin to see little examples here and there of people doing interesting things with their data. So in other words, they cancelled it now, but I thought what New Relic was doing with the App Speed Index was fascinating, right, because New Relic has access to whatever it is, tens of thousands, or hundreds of thousands of nodes sort of all over the world. And they can begin to give you a baseline in terms…

Kent: Oh, is that all?

Steve: Whatever it is, yeah. It’s not quite the Facebook experience.

Kent: I am spoiled at Facebook [inaudible 00:50:18].

Steve: No, no, no, I know.

Kent: Had to throw my little snotty snark in.

Steve: Of course, of course. No, it’s really appreciated.

Kent: Hundreds of thousands, okay.

Steve: Yeah, who knows, maybe it’s millions. I don’t have the actual number for them. But I think the point is that whether you’re talking about a New Relic, whether you’re talking about a Facebook, once you get to even a modest level of traction, you can begin to make some sort of interesting assumptions, you can begin to make some interesting conclusions in terms of, “Hey, what’s going on? How do I compare to a baseline? How do I compare to people in my industry? How do I compare to businesses of a similar size?

Kent: How do I compare to six months scale?

Steve: Exactly. And those are the kinds of things that the advantage is that they really only become more valuable and more defensible from a market standpoint overtime, because we saw this in the case of Apple Maps, where Apple goes out and drops 200 and some odd billion dollars on six or seven different startups, come out with a really nice, aesthetically pleasing, well-designed mapping product and it’s an absolute disaster because they hadn’t spent the last 10 years or 15 collecting data. And you can’t make up that ground inorganically. So yeah, I don’t know that The Software Paradox is necessarily easily answered in every case by sort of data or telemetry based models, but I think particularly in infrastructure software, I think it’s going to be a common answer and I think it’s going to be a good one.

Kent: Yeah, I was just trying to think of what’s changed to make that true. As bandwidth gets cheaper and cheaper, then collecting the data becomes cheaper, there’s fewer barriers.

Steve: Yeah. And also you allow customers to acclimate, right? So for example, when I was a systems integrator in the ’90s, right, we ran around and talked to lots of different businesses. And one of the things that we would talk to them about was, “Hey, you guys are not good at implementing CRM software. Basically half of the implantations fail. Why don’t you let us run this stuff for you in a data center, there will be dedicated hardware, nobody else has access, etc.” All those businesses came back and said, “Yeah, you guys are nuts. The customer data’s the most valuable data we have. It’s never leaving our firewall, over my dead body.”

And we’re five years later, every single one of them is running in Salesforce, because your initial reaction, your initial apprehension and so on will give way overtime if value is demonstrated. And that, I think, is going to be the trick with a lot of these businesses is, all right, you need to give customers time to get used to the idea of, “Okay, look, I’m not giving away my customer data,” for example because none of the vendors want that, that’s more liability than it’s worth. Basically they, in many cases, if not all cases, just want, “I want the telemetry. I want the data about how you’re operating. The actual data itself, that’s actually toxic. I don’t want anything to do with that.”

So as customers gets used to that, and more importantly as you can begin to show them value which is, “Okay, look, if you share this data with us, this is an example of the kind of data that we’ll give you in return.” “Oh, okay.” I’ve used the example with Gmail. If you walk up to somebody in a vacuum and say, “Hey, do you want an email client that’s going to scan your email and sort of mine it to present you with better ads?” Everyone says, “Absolutely not.” If you put Gmail in front of them and then say, “Oh, by the way, this is the cost to that.” Everyone says, “Oh, okay. Yeah, that’s fine. I can do that.” So a lot of it is just how you present it.

Kent: Yeah. So can we wrap this up?

Steve: Yes.

Kent: Here I am, driving your podcast, with concrete suggestions for those classes of people that I mentioned. That’s why I wanted to talk with you, like, “Oh, what do I do, Stephen?”

Steve: Yeah, so let’s see. So the classes of people were…

Kent: Recent graduate.

Steve: Recent grad, MBA…

Kent: MBA, mid-career.

Steve: Mid career and what was the last one?

Kent: Geezers like me.

Steve: Geezer, okay.

Kent: And then investors.

Steve: Okay, and then investors. So I would say for the recent grads, I think open-source is a great way to sort of build your visibility. In other words, you need to think about return, not necessarily in financial terms and not necessarily expecting to be sort of the runaway success of a JUnit, but a lot of the businesses we speak with are looking for profiles and they’re looking for contributions, they’re looking for sort of demonstrated capabilities, and open-source is a fantastic way to do that. So your return as a recent grad in terms of releasing projects of your own or contributing to other projects is going to be, I think, reasonably high from a career standpoint. For folks mid-career, I think you need to think sort of more about okay, presumably at that point you’ve made your name, you have some reputation and so on, and obviously you probably have more responsibilities in terms of your life and sort of dependence and so on, in which case, your concern shifts, right? You need to think about, all right, maybe I don’t release this project as open-source software, or if I’m going to release it as open-source software, it’s only going to be sort of as an incentive to essentially move along to other forms of business, like a services arm [SP].

And likewise for an MBA, if I’m an MBA, I would look at the numbers sort of across the board. I would look at The Software Paradox and basically say, “I’m bullish on services, I’m bullish on data, and I’m…” A software business that wanted to recruit me out of an MBA program would need to really prove to me that they have an answer for this. In other words, again it’s not impossible, there are businesses that are exceptions to the rule, but I want to see you prove it and I want to be convinced. And then for the geezers, basically I think a lot of it comes down to what your goals are, right? In other words, if you’re a Kent Beck and your reputation’s assured, then I think that’s not a big deal. I think that there are lots of different things that you can contribute to, whether that’s open-source, whether that’s businesses of all shapes and sizes. Again, if I was going to get hired, I wouldn’t want to buy a software company, I [inaudible 00:56:40] understand and sure to what their answers were.

Kent: I wonder if this hospice model is kind of a natural landing ground?

Steve: It could be. Certainly could be because if you are comfortable sort of not being necessarily in a high growth market, a lot of those businesses are going to be ones that are familiar to them. A lot of those products are going to be ones they probably use or worked on or build competitors to. So yeah, I think there’s opportunities there.

Kent: Okay. Cool.

Steve: All right, well I have one last question, one quick last question for you.

Kent: Sure.

Steve: It’s a fun one that we’d like to close on, which is, what animal are you most frightened of?

Kent: Okay, we had a ram here who figured out, he was kind of a silvery gray color, he figured out how to open the latch to his house. So you’d go out and feed at night and he’d be literally lurking behind a tree and you couldn’t see him, and the first thing you knew, you were flying through the air. And Smiles, his name was Smiles because it looked like the joker face had been painted somehow on his face especially after he had just sent you. And Smiles sadly is no longer with us, but he really scared the crap out of me.

Steve: That’s fantastic. Well, with that I think we can bring it to a close. Thanks so much, Kent, for the conversation. It’s been a lot of fun.

Kent: Thank you. Oh, it’s been my pleasure. Thanks, Stephen. Bye-bye.

Steve: Thanks again for listening to Hark. As a reminder, you can find us on Google Play, iTunes, Pocket Casts and Stitcher. You can also listen directly or find links to all of the above or by heading over to, which will take you to SoundCloud. If you have questions, feedback, or suggestions, you can hit us up on Twitter, @harkpodcast, or via email at [email protected] We’ll be back next month with episode three and until then, enjoy your time.

Categories: Economics, Open Source, Software-as-a-Service.

What is the Future of the PaaS Term?

In the beginning, Platform-as-a-Service (PaaS) was an easy to understand category of software, even if it wasn’t called Platform-as-a-Service initially. In its earliest incarnations, PaaS was a seamless if tightly constrained fabric which abstracted and made opaque the infrastructure running underneath it, from database to operating system. The promise to developers was, in strictly functional terms, serverless. No longer would developers have to concern themselves with operations minutiae like server instances. Instead, they deployed applications against a given platform and from there on out operations were, at least theoretically, the platform’s problem.

Since those first tentative releases in early 2007, PaaS has become more complicated to explain, both because the category itself has expanded its ambitions and because other, competitive layers of abstraction have emerged.

Most obviously, there is the container-led DIY infrastructure phenomenon, a phenomenon that has so far not been gifted with its own PaaS-like acronymn.

This chart represents worldwide searches for Docker the computer technology. Of note here is the timing; the technology began taking off in 2013, which follows as the initial release of the project dates to March of that year. Docker neatly packaged up, and thus popularized, container technologies, helping to drive widespread interest and adoption of the technology. Adoption at rates that are faster than almost every technology tracked over the history of RedMonk.

PaaS as a category was roughly six years old at the time containers burst onto the scene; Cloud Foundry and OpenShift were around two years of age. For reasons of execution more than model, the growth of platform technologies was anemic relative to its slightly older infrastructure competition. Nor did interest in PaaS ever spike in the way that it has for containers.

This chart, in turn, represents another emerging high-visibility technology trend. The blue line represents searches for the containerization search topic; red is web searches for serverless. Originally a market limited to AWS’s Lambda service, competitive services from Google to IBM to Microsoft are expanding the category and interest in it dramatically. Time will tell whether it not proves to be a force on par with containers, but there’s little question that at present serverless is capturing developer interest.

The question moving forward is what the future is for the PaaS term. Containers are distinct functionally and in terms of ambition from traditional PaaS offerings, clearly. Serverless has some similarities in terms of high level positioning, but takes a materially different approach again than traditional PaaS offerings.

All three, however, and their attendant ecosystem members (e.g. Kubernetes) represent layers of abstraction that enterprises are increasingly evaluating, if not exactly side by side, relative to one another. “What are the virtues of serverless versus a traditional PaaS?” is a question that is being asked more and more. The same question, but for container-led DIY infrastructure is even more common.

There would appear to be two paths for PaaS moving forward. Down one, it becomes a catchall, and thus compromised, term for a host of unlike technologies. This will lead to PaaS evaluations that will attempt to compare in apples to apples fashion, say, Cloud Foundry vs Docker/Kubernetes vs Lambda. On the one hand, all can serve in some capacity as foundations for applications. On the other, the way they go about this and the tradeoffs they imply are entirely distinct.

Down a second path, however, PaaS gradually becomes deprecated as a term. The Cloud Foundry ecosystem, for example, has seemingly retired that term in favor of messaging around Cloud Native. Google App Engine, for its part, mentions Platform-as-a-Service in its page title but nowhere else; Heroku uses the term not at all on Apprenda and OpenShift, however, use it far more prominently.

In either case, PaaS will persist as part of the industry lexicon for the foreseeable future. How it’s used, however, and by whom, will go a long way towards determining the utility it has remaining.

Categories: Platform-as-a-Service.

The Future of Open Source

Last week in New York, the venture firm Accel held a ninety minute lunch for an audience of financial analysts and equity professionals, a reporter or two and at least a few industry analysts. The ostensible subject for the event was Accel’s Open Adoption Software (OAS) model, but the wider focus was what the future held for open source in general. Accel’s view on this subject, as well as that of the panelists at the event from Cloudera, Cockroach Labs and Sysdig, was that open source essentially has gone mainstream. As Jake Flomenberg, an Accel partner put it, “There is a massive shift going on in the ways technology is bought. Open source has gone from the exception to the rule.”

Setting the OAS model to the side for the time being, the larger message that open source has become the default choice in a wide array of infrastructure software categories isn’t difficult to sell. It is a message that was consistent with attendee sentiment at the most recent OSCON in Austin, the ApacheCon in Vancouver, and the Linux Foundation’s Collaboration Summit in Lake Tahoe before that. It is a message that Cloudera’s Mike Olson would presumably agree with; in 2013 in a piece entitled “The Cloudera Model,” he said simply, “you can no longer win with a closed-source platform.”

The idea that open source has effectively won within the enterprise is also consistent with RedMonk’s own views on open source. The only real difference is that this perspective, for my part, is better used to describe open source’s past than its future.

At the 2005 O’Reilly Open Source Conference, for example, I gave a keynote to the room full of developers entitled “so you took over the enterprise: what now?” Open source was not yet as common within the enterprise eleven years ago as it is today, but from our vantage point it had passed the tipping point, and its trajectory was assured. The decade since giving that presentation has done nothing but validate that original assertion, as open source projects and efforts to commercialize same have entered market after market, category after category, to the point where there are frequently more open source options available today than proprietary alternatives.

All of which has led to open source advocates taking a deserved bow for their success. It was by no means assured; certainly in the early days of RedMonk, there was major skepticism about the model in general, from its ability to sustain itself commercially to its vulnerability to everything from intellectual property violations to security exploits.

But just as open source is finally being recognized as the viable model we always believed it to be, it is facing competition that enjoys some of the same advantages over open source that open source had relative to proprietary software.

That competition is the cloud.

Competition is an interesting term to use, to be sure, because the cloud is built for the most part from open source software, and the cloud is such an important channel that it has elevated open source projects such as Ubuntu to first class citizen status. The presentation that Accel gave didn’t mention the cloud as a competitive threat, and the competition most frequently discussed by both Accel and its open source participants was proprietary software companies.

But if we step back, public market activity suggests that more concern is warranted. We know, for example, that Oracle, one of the standard bearers for proprietary software, derives less of its revenue from the sale of new software licenses every year. We also know that Amazon Web Services, which conflates open source, proprietary software and hardware, is growing quickly. Correlation may not prove causation, but it’s difficult to build the case that these two facts are unrelated.

Consider some of the differences in user experience when comparing cloud services to traditional on premise open source alternatives.

  • Convenience:
    Open source was used over proprietary software in many cases not because it was functionally superior or even because the source was available, but simply because it was easier to obtain. To get a closed source product, best case you needed to fill out long, involved registration forms; worst case you needed to talk to a salesperson and find budget. With open source, you simply downloaded what you needed and were on your way. What could be simpler? How about not downloading anything, but standing up a given piece of software already combined with hardware in seconds. Where open source once held the title of most convenient, it has long since ceded that to the cloud.

  • Complexity:
    Open source has long prided itself on representing choice: in any given category of software, users are free to pick from multiple, credible open source implementations. But that choice is an overhead, overhead that is multipled with each additional choice a user has to make. By comparison, cloud platforms typically have a default service available in a given category: one monitoring tool, one container engine, one storage array, one CDN and so on. Users that require more control have the ability to run the software of their choice on infrastructure they maintain, of course, but also can follow the path of least resistance and simply accept the default – which they don’t have to run.

    As an aside, this problem is one reason foundations like Cloud Foundry or the Cloud Native Computing Foundation are interesting, given that their focus is integrating disparate parts and projects.

  • Operations:
    Because many commercial open source organizations were built to compete with proprietary alternatives, convergent evolution has led them to look and behave in similar ways. In many cases, for example, the burden of getting a given open source offering stood up and integrated is left to a customer, or their high priced systems integrator of choice. Relatively few commercial open source organizations have services capabilities extensive enough to assist with more than initial configuration and setup: integration is an exercise left to the buyer. While cloud services are not a panacea, many of their services are by comparison more easily integrated with one another than independent on premise open source alternatives.

    Just as with proprietary software, cloud services can be sold against open source alternatives on a CAPEX vs OPEX basis; rather than pay up front for support and service, for example, these expenses can be born instead over time in the form of premiums above a base infrastructure cost. This may result in higher costs over time, of course, but the ability to amortize payments over time can be useful to cash-constrained business units or startups.

  • Data:
    While vendors both cloud and on-premise have been reluctant to invest in and market, at least overtly, telemetry and data oriented models, it is inevitable that they will in my view. If this should come to pass, it’s another advantage for cloud over on premise open source, because collecting data from datacenters you maintain is far less complicated than retrieving it from individual user facilities.

It’s important to differentiate, of course, between the outlook for open source and commercial open source. The prospects for the former remain reasonable, as it remains a fundamentally more viable methodology for most classes of software. The rise of the cloud may even accelerate the availability of certain classes of open source software, either because its authors are not in the business of selling software (e.g. Facebook/Cassandra or Twitter/Heron) or because commercial vendors seek to reduce customer fear of lock-in via public cloud implementations of OSS (e.g. Cloud Foundry, MySQL).

For commercial open source vendors, however, it is important to recognize that the cloud is at least as much threat as opportunity. Many of our commercial open source customers, whose primary business is selling to enterprises, have acknowledged this, and are ramping into the public cloud as quickly as they can. This rapid flight is creating strange bedfellows, in fact: several open source vendors admitted at OSCON that the best cloud partner in their experience was Microsoft – an interesting turn of events for someone who remembers that Jason Matusow needed security for his first appearance at the conference, so hated was the vendor at the time.

Does the cloud represent opportunity for open source as well? Undoubtedly. But the future outlook for open source, particularly those who would commercialize it, seems counterintuitively far more murky today than it was in 2005. Open source is rightly being heralded as the default, and having “won.” The difficulty is that in this industry, victories tend to be very short lived.

Open source has learned how to compete, and compete very effectively, with closed source. That’s its past. Its future will be competing with the public cloud, and the first step towards doing that effectively is admitting the problem.

Categories: Cloud, Open Source.

Hark Episode 1, “Election Night”: Guest, Jeremy Bowers

As newsletter subscribers are aware, Episode 1 of my new podcast Hark, “Election Night,” is here and ready for your listening pleasure. In this month’s episode, Jeremy Bowers, a developer on the New York Times Interactive News desk (which is hiring, incidentally), takes us inside the Times’ newsroom on Election Night. From the technical stack to the schedule that night to what the favorite catering choices are, Jeremy provides a behind the scenes look into what life is like for the developers on the other side of the website you try to bring to its knees by hitting refreshing over and over and over. And possibly swearing at. Jeremy and I also talk programming-as-journalism, AWS, WebSockets, how to get into elections as a technologist and more, so give it a listen.

For those of you who don’t do podcasts, however, we offer a transcription of the episode below. Enjoy.

Stephen: Well, excellent. Jeremy Bowers, welcome to Hark. If it’s okay with you, I’d like to start with a couple of basics. Who are you, and what do you do?

Jeremy: Sure. It’s really good to be here. My name is Jeremy Bowers, and I work for the interactive news team at The New York Times. We’re a weird little collective of programmers that sit in the newsroom and work really closely with reporters and editors on projects that sort of fall in the gaps between our graphics desk that does lots of charts and maps and the paper’s standard IT staff that would build a lot of platforms and our CMS, for example.

There’s a lot of these sort of Big Data projects, Olympics, elections, World Cup, that sort of fall in between the skill sets of these other groups. That’s where our team comes in. We jokingly refer to ourselves as Navy SEALS. Someone’s got to fix these things. Someone’s got to make them work, and that’s what we do.

Stephen: Nice. With getting into political news, elections seem to be something of a specialty for you. Was that a conscious plan, or is that something you kind of fell into? If I go through your background, beginning at the St. Petersburg Times to NPR and now with The Times, from the outside looking in, it seems as if your career followed a plan. Is that an accurate statement?

Jeremy: No, not at all. The best part about it is it’s a demonstration of just how the incentives for a lot of newsroom programming teams work. Elections are like a particularly perfect case where it’s just a little too much data for most people’s graphics desks to handle all alone. Though, honestly, The Times’ graphic desk could probably do this in a heartbeat. They just need a janitor to help them clean up. It’s one of those projects where, because if you have programming skills and a little bit of journalism sensibility, you will end up working on that project everywhere that you work.

This will be my fourth news organization which I’ve worked on a general election. I did one at the St. Pete Times, one at the Washington Post, one at NPR, and then 2016 here.

Stephen: Nice. So the one at St. Pete, was that the first election you worked on?

Jeremy: It was, 2008, with a young gentleman named Matthew Waite. I don’t remember very much about it. I was working on it like part time, because I was also working on blogs. My first job was as a blog administrator writing PHP and Perl templates for trashy old Movable Type and pre-WordPress blogs. It was not good. There was a better world out there. There was a world in which we were pulling down CSVs and running Python scripts against them. It just looked so cool, and I really wanted to do that pretty badly.

Stephen: One of the things that comes up in your background as we look through it is that you’ve sort of been immersed in blending data-driven analysis and news coverage. The idea of programming is journalism has trended in recent years. What’s your take on that? Programming is journalism, is that a real thing? Where do you think we are in that process?

Jeremy: There’s really two parallel thoughts that I have on this. The first thought is that there’s definitely some beats where if you’re not a programmer, you’re going to have difficulty keeping up with places that have a programmer helping you cover that beat. A good example of this would be campaign finance. We’ve had campaign finance laws since Watergate, which forced PACs and campaign committees to release their data either quarterly or every month. But if you were a reporter working on campaign finance in the ’80s or ’90s, you had days to write your story.

You could just leaf through the filings. It would take you really a long time, and you may not be able to find really useful stories, because people are really bad at finding small changes in data over time. Computers are super good at that.

Campaign finance is one of these examples where if you are a programmer, you can write scripts that will do aggregates, that will do comparisons, and it will let you write a story that’s just better than the story that you would have written before. We’re not even talking about something like a chart or a graph that’s a nice visual presentation. That’s just the standard story that you are going to write. You can just write that story better if you have programming skills.

That’s one part, is beats that require a programmer. Then, there are other things where it’s an older beat, but programmers could make it better. I like to think about things like police reporting or places where there aren’t data sets already, things like drone strikes or Guantanamo detainees, things where having a more structured understanding of how the world works, accumulating data sets that maybe don’t already exist can be a form of reporting in their own right. In particular, I really enjoy those.

Our team at The Times maintains this database of people who’ve been detained at Guantanamo, and I just don’t know of anything else that’s quite like it. It’s a fascinating data set and a really neat project. It only exists because someone bothered to sit down. Marco Williams and a team of interactive news developers sat down and decided to start tracking this and make a website out of it.

Stephen: That’s interesting. Certainly as someone in the industry, I’ve always found this fascinating in going back to some of the early days with sites like by Adrian Holovaty and basically taking raw crime dumps, putting them on a map, making them interactive, and making them useful in ways that we haven’t seen before.

I’m curious, though, because you hear different things about the traction behind sites like FiveThirtyEight in terms of trying to cover things other than elections. As somebody who does this work, do you think it’s something that’s more outlet driven, or do you think it’s something that is in demand from the audience? Is this something that is actually in…is it a pull thing, or is it a push thing, do you think?

Jeremy: Yeah, that’s actually a really good question. My personal bias on this feeling is that people aren’t in it for any particular form or presentation, but they’re in it for a story or some information, right? Our audience has always wanted to know things about the election, and they would love to have a better take on it. If the better take happens to be more visual or if we can tell them a story that we only can tell well in audio, then we probably ought to tell it on audio. If we have a story that comes out better as a chart, then we probably ought to tell them that story as a chart.

The thing I think that we miss if we don’t do data journalism particularly well is that we miss out on the stories that are told better when they’re told less anecdotally and with more rigor. That doesn’t mean that every story that we tell ought to be that way. There are many stories that are just much better as anecdotes. Some stories are good as a good mix. I really am fascinated at trying to find places where there’s an intersection of good story and anecdote but also lots of good structured data.

Stephen: Yeah, it’s interesting, because it’s one of those things that I…for example, with baseball. I’m a big baseball fan. I consume a lot of the analytical content and so on. The interesting thing, though, is that some of the writers, I think, get too far removed from the actual reader. Even as somebody of an analytical bent and somebody who’s technical, when you begin just throwing up charts because you have charts, you kind of lose the plot at times. Is that something you guys worry about? How much input do you have from the editorial side in terms of trying to make sure, “Hey, look. Let’s not just throw numbers up. Let’s try to make sure this is a story”?

Jeremy: Entirely. This is sort of the problem, right? Is that it’s so easy if you’re doing data journalism to stumble into a pile of data and then say, “I will put this up on the Internet for someone to explore.” The truth is, that’s the reporter’s job, is to explore and find a story and then to tell the story, not just to put it up and let people struggle through it.

The other thing that strikes me about that is that I don’t think there’s a competition between anecdotal and structured story telling. There are really great stories, like listening to Vin Scully talk about baseball. I don’t want to hear Vin Scully talk about someone’s wins above replacement. I’m sure he would be great, but he has a strength, and he should stick to his strength.

There are other people like Harry Pavlidis working on whatever random model he’s working on. I love to read about that too, but all of it should tell me a story. That stuff about catcher framing I feel like it’s some of the best analysis that I’ve seen lately, because it basically told you more about baseball. You learned all the stuff that you already sort of suspected, right, is that catcher is really valuable. Game calling is valuable. Turning balls into strikes is valuable.

But there’s something more than just that. It’s being quantitative about it, being able to say, “This is how much more valuable that player is than they used to be.” It opens up a ton of room for a standard reporter to just go out and ask a bunch of questions. Nobody was going to ask Matt LeCroy about how he framed pitches beforehand, because no one realized that it was super important, you know?

Stephen: Yeah, yeah. All right, let’s get back to The New York Times then.

Jeremy: Yes.

Stephen: When you look at elections, what is the agenda? How do you drive evidence or facts based reporting into coverage of an election? Where do you get the raw data? How does the process work?

Jeremy: Yeah, absolutely. We get our real time data from the Associated Press, who maintain a huge stack of reporters and stringers who go to precincts all around the country and will report back data that they are watching officials hand enter. As a result, we get the data from the AP much more quickly than we would get it from the Secretary of State for that state, which is where the ultimate election results end up. But the real election results don’t show up for several months. The primaries that we’re watching right now, those are largely unofficial. They’re a project of the state’s party.

So the AP really becomes the de facto standard for a lot of this data especially in real time. We pay them for access to that data. Up until 2014, that data was provided through a FTP dump, a series of oddly delimited files in an FTP folder that you had to know how to find the correct file and load it up. They would update it every couple of minutes, every five to seven minutes or so. You could hit it once every minute looking for an update.

Well, in 2014 the AP released an HTTP API. For the 2014 general election, we didn’t use it. But for 2016, we decided we wanted to, because it’s a little faster. It gets the data to us as soon as three minutes, and race calls are instantaneous. An AP editor will hit a button to call a race for Bernie Sanders, and it will almost immediately show up in the API. So we want that speed more than we want almost anything.

That meant that we had to rebuild our election rig this year. It’s an old app, actually I think the oldest continuously updated application in our portfolio. It’s a 2006 era Ruby on Rails app that is not modular. It’s a very large app that parses all the data, loads it into a database, does sanity checks, does difference calculation, bakes out HTML and JSON of all the pages. I think it was like 200 total files/URLs for every election, which is a lot of things for a single application to do.

This year, we decided that we were going to break that down into a series of more modular little pieces, which was very exciting to make a big change like that in advance of such a big election cycle. We decided that that was really important. It would also give us a chance to rewrite it in Python and some old magic style Bash scripts, make it a little easier for us to maintain, and make it a lot easier for other people on our team to be involved in it as well.

Stephen: Yeah. That’s a great segue. You mentioned Python. You mentioned Bash. What are the technologies The Times uses? What’s the stack look like?

Jeremy: Yeah, absolutely. My little team, we have some Rails developers and some Python developers on the back end. We write a ton of Bash. We write some Node on the server depending on the project. We have a ton of JavaScript on the front end.

This year we’ve decided that long polling for results is sort of an anti-pattern. It’s slow, especially on mobile devices, to have to pull down a ton of JSON every 30 seconds. This year, we’ve been using WebSockets. We’ll open a WebSocket connection when the client connects. He’ll get the initial data from the page, then we can push you fractional updates whenever an update comes through. Because you’re not polling, the browser doesn’t have to do nearly as much. We don’t rewrite the DOM as often. The result is it feels a lot better. A user can scroll up and down, and that doesn’t feel laggy. The updates are really small, so even over a slow connection they work pretty well.

Stephen: Do you have issues with a browser in terms of their relative support levels for WebSockets?

Jeremy: Oh my, yes, although, truthfully, most…We did not use WebSockets in 2014 for this reason. We tried. Asterisk: we wrote a library that will fall back to long polling in the event that your browser doesn’t support WebSockets. As a result, it’ll just be a little slower, a little laggier. In 2014, I think we defaulted to long polling in a lot of cases. This year, almost an overwhelming number of the clients that have visited us have been using the Sockets. It’s better for everybody that way. It’s better for us, because we’re moving less data over the wire. It’s better for the client. That’s actually been a really good thing this year.

Stephen: Okay. Let’s fast forward to election night. This could be primaries or general. Without getting into sort of raw numbers, which I’m sure The Times would prefer that you keep to yourself, what kinds of spikes in traffic do you expect to see? In other words, versus a normal day. Is it 2X, is it 10X? What are you looking at in terms of the relative demand?

Jeremy: I can tell you that on an election night our results page is like having a second home page. We’re getting as much traffic to that results page on average as our home page does even on that night, which is already a multiple of a normal night.

I can tell you a pair of anecdotes that I think you’ll find amusing. The general election in 2012 set traffic records that we didn’t break until 2014. One of the things that happened in 2012 is that we, as a result of a cache misconfiguration, pointed the fire hose of the home page, plus all the people looking at our results pages at an unwarmed Amazon elastic load balancer, which immediately crumpled under the load. It was the first time I’d ever heard of that happening. I didn’t even know that that was something that could happen.

Stephen: That’s funny.

Jeremy: This year, we got a phone call and an email from Amazon, because we had done a very similar thing. We’d pointed about 14,000 requests a second at an Amazon S3 bucket that had not previously taken any traffic. As a result, we were returning about 1 in 5, 1 in 6 pages were a 500, something I’d never seen before. So we got a nice phone call from them about that as well.

Stephen: There you go.

Jeremy: So we’ve gotten to know our Amazon service reps, so it’s been nice.

Stephen: I was going to say. Amazon must be crucial just in terms of being able to spin up and spin down, depending on the load.

Jeremy: Yeah. There’s actually a handful of confounding factors about a general election that make programming against it a little difficult. We have geographic issues going on, right? We can’t not serve election results, just because there’s an outage in U.S. East 1. So we have to maintain duplicate versions of our whole infrastructure, our whole stack in East and in a West zone, availability zone. We have some issues with scale, which I kind of alluded to. It’s just thousands and thousands and thousands of requests per second on just a normal primary night. For the general, we’re pretty much expecting to set traffic records that we may not break for four more years.

There are a large number of contributors to our software. I’m working on sort of the core election stuff. But we also have three or four developers on our graphics desk who are working on a Node app that handles all the HTML and maps and stuff like that. We have two or three other coworkers of mine who are working on sort of back end pieces, and then a handful like site engineers. It’s like when you’ve got 10 or 12 people all contributing code to the same thing. That’s a confounding factor that you almost never run into on a normal project, especially a small one like this.

One thing that’s particularly hard about the election like this is that we have the staff like The Upshot, or we have other smart people inside the company who would like to do something new this year, or they want to do something they haven’t done before. A great example of this would be those Upshot live models. In previous years, we would have had to write special software to get them a route that would produce the data that they need. Then we would have had to help them scale their app up. It really would have been very difficult to do what we’re doing this year in a previous year.

Because of the way we set this up, very modularly, The Upshot has access to all the same election data that everybody else has. So they can test. They have access data to do tests on. As a result, they can build these live models, they can deploy them, and it just runs. No one has any questions. It makes it a lot easier to do things that you might say are “innovative” or things that are just new, things that are different to us and that normally we would have had to put a lot of development time into.

Stephen: Yeah, indeed. In terms of preparing for a major event, whether it’s Super Tuesday or a general or what have you, what are the kinds of things that you preemptively do, either on the people side or the hardware side? Obviously, you pre-warm hardware. Do you work at shifts? What can you tell me about the process rolling up to an election?

Jeremy: The one really great thing that I enjoy is that, because it’s a newspaper, we’re already people who are used to the idea that we’re all hands on deck for breaking news. We put together a little war room of the core folks that need to be around.

A great example, of this would have been on Super Tuesday. Big election night, so we have two or three of the core committers from the graphics desk who work on the maps and who built the charts sitting in the room. We’ve got me, who handles the election data in the room. We’ve got two editors in there. We have a politics editor available to make race calls as necessary. A Masthead editor occasionally drops by. It’s nice to have everybody in one place. That actually solves most of the problems that we have.

Stephen: Interesting. Okay.

Jeremy: I’m sure it won’t shock you that most of the problems that we have are not technological in nature, but human, right?

Stephen: Yeah, of course.

Jeremy: We’ll often have this case where something weird will happen, or it’ll look weird, but it’s actually completely legitimate, like the AP may call a race, and then we may not have any votes that come in for 30 minutes. It looks weird, but it’s totally legitimate. The AP can call. They have models for race calls that involve things like exit polls, votes that they have access to that haven’t been assigned to precincts. They can know in advance with some regularity that this is going to be a race that X candidate wins.

So they’ll call it, but we won’t have any votes in the system for a good 20 or 30 minutes while the secretaries of state prepare to deliver the first batch. It’s nice to have everybody in the same room so we can calm everyone down and let folks know this is totally legit, nothing’s broken, and this is correct. So it’s good. Other than that, we get a lot of sleep the day before.

Stephen: That’s right, yeah.

Jeremy: Try not to deploy anything new. We test like hell beforehand. That’s one thing I can say I really enjoy. My editor’s given me months in advance of these primaries to write software and then write tests and then write an entire rig for simulating an election that we can run our software as if we were in the middle of an election night. We went so far as to build in something that will simulate 500s or 403s from the AP, like that they’re down or that we’ve run out of API requests.

Stephen: Okay, so kind of like a chaos monkey for The Times?

Jeremy: You got it, exactly. Because we need to know that if something really horrible happens, we’ll be able to stay up and continue providing results.

Stephen: Right. What is a typical shift then? In other words, when do you come in that day? What time do people leave?

Jeremy: On a normal election night, I’ll usually take a train up to New York the day before and work like a half day on a Monday. Those elections are usually Tuesdays, the big ones. Tuesday morning, I’ll get in around 10:00 or so. The AP will push what they call live zeros around 11:00. This is to say they’ll have all the reporting units. This is the geographical areas that they expect to have results from in and with zeros as their vote totals. This lets us warm up the database with all the candidate information, all the names of all the counties that we expect to see, and it gives us an opportunity to go and edit names. Times style would be Donald J. Trump instead of Donald Trump, for example. So we have a handful of overrides like that that we need to do.

Between 11:00 and noon, we’re in loading that initialization data and baking out our first result pages that are all empty. Then we basically go eat lunch, and then all get back together about 5:00. First results usually start coming in somewhere between 6:00 p.m. and 9:00 p.m. Then it’s basically just all hands on deck until about 1:00 a.m. when the last handful of results come in, sometimes even later, if it’s like Hawaii or Alaska. But yeah, that’s what the night looks like.

Really, the big push is for the next morning when we have to write the day after story. We’ll have a politics editor asking us what county did Donald Trump perform the best in Massachusetts or where did Hillary outperform her 2008 in South Carolina. We go pull data for that. It’s nice. Actually, I really enjoy the day after almost more than I enjoy actual election nights.

Stephen: That’s funny. Well, actually, that kind of does make sense. What does ‘The Times do? Do they cater for you? Do they bring food in?

Jeremy: Oh, yeah, absolutely. As a matter of fact, one of The Wall Street Journal reporters, Byron Tau, makes a note of tweeting what everybody is having for dinner that night. He’ll have CNN, The Journal, the National Journal. The Times is pretty good. We normally have some combination of Szechuan or Thai. On Tuesday I was in New York for the New York primary. Of course, we had deli food. It was wonderful.

Stephen: Nice, nice, there you go. In terms of just wrapping things up then, for folks in the audience that might want to follow in your footsteps who are technologist and might want to sort of get into the news side or the politics side or the election side, what are the suggestions that you would have? How did you…I mean, obviously, we talked a little bit about how you got into it. What would your recommendations be for somebody who’s just breaking in?

Jeremy: I would say if you’d like to follow directly in my footsteps, you should be a failure at the first two things that you try, and then fall back to programming as your third shot at it.

Stephen: There you go.

Jeremy: I was a political science student at first and was going to teach, but my grades were terrible. I took the LSAT, because I thought I wanted to be a lawyer, and then did poorly on the LSAT. Then, in a fit of displeasure, I took a job working late night tech support at the St. Petersburg Times and just got started.

I’d say really the best thing is to listen for people’s problems. Almost all of the best software I’ve written has come from talking to a reporter and hearing a problem that a reporter has on their beat or somewhere in the news gathering process. We’ll have a researcher who will say,”Man, I just know that there are these people who die in Afghanistan and Iraq,” this happened at The Washington Post. There’s folks that die in Afghanistan and Iraq, and we get a fax every week with the names of every service member who dies.

But it’s really hard for us to answer any questions like, “How many people died in Afghanistan and Iraq this week?” Because we’re not sitting down and writing those together. You can do something as simple as set up a spreadsheet for someone or a little simple CRUD admin. It’s little problems like that that often turn into really cool stories, eventually.

I also say that your first project doesn’t have to be a big, award-winning, amazing data thing. There are lots of really easy low hanging fruit. I think Chicago Crime is a great example of that, because it wasn’t necessarily, on its face, supposed to be a journalistic enterprise. It was just a good civic data project. It was just as a citizen you need to know about this.

I feel like some of our best recruits have come from the civic data world, people who are just personally interested in the workings of the government or in our society and worked on a data project around that. Those people almost always have got the same sort of built-in incentives and ethics that we’re looking for here in the Fourth Estate.

Stephen: Yeah. In other words, what you’re saying, to some degree then, is that you’re not just looking for the technical skill set. You’re looking for somebody who is really able to work, whether it’s a reporter, whether it’s somebody sort of in a civic situation, but is able to listen and translate that into an actual requirement, as opposed to, “Hey, here’s some data. Let me show you something fancy I can do with it.

Jeremy: Yeah, absolutely. Truth be told, so many of the projects that we work on, you would think of them as boring technologically. It’s not as much fun. We’re not moving around millions of rows of data, although some of our projects, we get lucky and get to do things like that. A lot of it is just solving what I would consider to be fairly simple technical problems but that take a lot of empathy to figure out that they even exist.
Yeah, there’s a world of easy low-hanging civic fruit to get started on if you’re really interested in this sort of thing.

Anybody can be a journalist, man. You can be a journalist if you want to keep track of what’s happening to airplane tail numbers, and you want to see what that plane is that keeps flying over your house. This is like a great story. One of these civic data groups was watching tail numbers and figured out that there are lots of fixed wing aircraft flying over the city that were all rented out by the same company in Virginia. It was really weird. It turns out that company is owned by the FBI.

Stephen: There you go, yeah.

Jeremy: This is where good stories come from, right, is observation and tracking.

Stephen: There really is so much data. It’s just a matter of having, in many cases, the desire, I guess, or intent to put it to work, in other words, something stupid that I end up doing every year. There’s no tide charts, right?

Jeremy: Oh, yeah.

Stephen: We live on a river that opens up into the ocean about a mile and a half down. There’s no tide chart. It turns out that if you hit…what is it? I think it’s NOAA. NOAA has all that information, but it’s in some terrible format. [Ed note: credit for this process belongs to Jeff Inglis, who tipped me to it.] You pull it down. It’s not that hard to translate it into an ICal, and all of a sudden, it’s in your Google Calendar. Great, I know when the tide’s coming in. I know when the tide’s going out. Does that require any special technical skills? Not at all, but it is the kind of thing that you can take data in one format and make it useful to people. Yeah, I can definitely see that.

In terms of working on elections, all the work that you’ve done that have gone into all the elections that we talked about, what, for you, is the most rewarding aspect? What do you enjoy most about the work you do?

Jeremy: Oh, man, I think far and away it’s getting data to the smart people who work on our graphics desk or at The Upshot so that they can put together some killer analysis that nobody else has got. I loved looking at the big precinct maps that we had for New York yesterday or when The Upshot does their live model. Those are things that would be really hard to pull off unless you have got your election software down to not having to think about it at all. Because there’s so much plumbing and so much manning the bellows that has to be done in order to keep your election rig up and going.

The thing I feel like is that if you can apply just a little bit of technical rigor, maybe not too much, but just a little bit, or maybe some engineering principles to this, then it’ll give you the free time to work on the cool and awesome projects that you’d like to be doing in the first place. I’d have to say, that’s by far the most rewarding thing. It’s like getting all the low-end plumbing stuff out of the way so we can do really cool stuff in the time that we saved.

Stephen: Yeah, no, I’m sure. All right, last and most important question on Hark. What animal are you most frightened of?

Jeremy: Amoebas.

Stephen: Amoebas?

Jeremy: Yep, absolutely. They’re small. They can give you brain disease. You may never even know you have it. It’s really hard to kill them.

Stephen: That’s fair.

Jeremy: Yeah.

Stephen: All right, with that, I just want to say thanks, Jeremy. This has been fantastic. We’ll hope to get you back on soon.

Jeremy: Thanks, Stephen.

Stephen: All right, cheers.

Jeremy: Cheers.

Categories: Cloud, Interviews, Journalism, Podcasts.

Divide et Impera

[Caesar’s] purpose always was to keep his barbarian forces well scattered. During all of his campaigns in Gaul, he had a comparatively small army. His only means of success, therefore, against the vast hordes of the Gauls was to ‘divide and conquer.’
– Harry F Towle, “Caesar’s Gallic War

Throughout Roman history, and indeed the history of every large empire the world has ever known, divide and conquer has been one of the most reliable strategies for expansion and conquest. A tactic that exploits human nature’s preference for independence and autonomy, it is always more practical to engage with part of a whole rather than the whole itself. Conversely, populations that cannot unite will remain vulnerable to those which can. As Abraham Lincoln put it in a speech given in Springfield, Illinois on June 16, 1858, nearly two millennia after Caesar’s campaign in Gaul, “a house divided against itself cannot stand.”

The question of who is playing the part of the Roman empire today is an interesting one for the technology industry. Typically, when you speak with those shipping traditional on premise software today, their acknowledged competition is other on premise software companies. Much as the Gaul’s enemies, at least until Vercingetorix, were other Gauls.

At the OpenStack Summit last week, when posed the simple question of who they regarded as their primary competition, an executive representing a technical platform answered succinctly: “AWS. No question.”

That candid answer, while correct, remains surprisingly rare. It may not be for much longer, as the contrast in financial fortunes between Amazon’s cloud business attracts more and more notice, even amongst conservative buyers. For most of the past ten years, AWS has been hiding in plain sight, but its ability to sustain that below the radar success is being actively compromised by its success within the public market.

The story of Amazon’s ascent has been well chronicled at this point. As then Microsoft CTO Ray Ozzie acknowledged in 2008, Amazon had already by that point spent two years being the only ones taking the cloud seriously. Microsoft, to its credit, eventually followed fast but for them and every other would-be player in the market second place was the only realistic goal, at least in the short to medium term.

The less explored question is how those shipping on premise software might compete more effectively with the Amazon’s of the world. The answer of how not to compete, of course, is clear from both world history and recent financial history.

On a purely technical level, fragmentation is systemic at the moment, the new norm. Pick a technical category, and there are not just multiple technologies to select from, but increasingly multiple technical approaches. Each of these must first be deeply understood, even if they’re entirely new and thus difficult to conceptualize, before a technology choice can be made. And there are many such approaches to be studied and digested, at every level.

All of which is problematic from a vendor perspective. Making matters worse is that the commercial backers of these technologies are equally fragmented, which is to say they’re divided.

Consider the following chart on the microservices ecosystem from Sequoia:

Assuming one can properly understand the categories, and the differences in approaches between the technologies that populate them, and can evaluate the projects that match those approaches, what next? If you select NGINX, Kubernetes, Docker, Chef and Mongo, for example, what assurances do you have that these all work reliably together?

The answer, outside of rigid formal partnerships or broader packaging (e.g. CoreOS Tectonic), is very little. For all intents and purposes, each of the projects above and the commercial entities that back them are independent entities with different agendas and incentives. If you’ve ever tried to resolve a complicated failure involving multiple vendors, you know exactly what this means.

This is how the industry has done business for decades, of course. The majority of the incumbent technology suppliers, in fact, have revenue streams built on managing complexity on behalf of their clients. This complexity is also why strategies like Microsoft’s “integrated innovation” approach proved to be attractive. It’s not that each component of the stack was the indisputable technical leader in its field. It’s that they worked well enough together that long conversations about how to wire everything together were no longer necessary.

What if, in stark contrast to the industry’s history however, a competitive model emerged that abstracted traditional complexity away entirely? What if a growing number of difficult choices between specialized and esoteric software options gave way to a web page and a set of APIs for as-a-service implementations? All of a sudden, managing complexity – think large industry incumbents – becomes far less attractive as a business model. As does accelerating complexity by way of niche or specialized software – think point software products, which are now forced to compete with service-based implementations that are integrated out of the box.

With the advantages of cloud models clear, then, the obvious question is how alternative models compete. One obvious answer is to embrace, rather than fight, the tide. Many on premise software products will find themselves pivoting to become service based businesses.

But for those committed long term to an on premise model, new tactics are required. In a market that is struggling with fragmentation, solutions must become less fragmented. In some cases this will mean traditional formal partnerships, but these can be difficult to sustain as they require time and capital resource investments from companies that are certain to be short on one if not both. History suggests, however, that informal aggregations can be equally successful: the Linux, Apache, MySQL and PHP combination, as one example, achieved massive success – success that continues to this day.

The spectacular success of that particular combination is not likely to be neatly replicated today, of course, because the market has moved away from general purpose infrastructure to specialized, different-tools-for-different-jobs alternatives. There is no reason, however, that ad hoc stacks of complementary software offerings cannot be successful, even if some of the components bleed into one another functionally at the edges. If I was a runtime vendor, for example, I would be actively speaking with orchestration, container and database projects to try and find common ground and opportunity. Instead of pitching my runtime in a vacuum to customers considering public cloud options that offer a growing and far more complete suite of offerings, my offer would be a LAMP-like multi-function stack that customers could drop in and not have to assemble piece-by-piece, by hand. E pluribus unum.

This is not the market reality at present. Currently, vendors are typically heads down, focused on their particular corner of the world, built to viciously battle with other vendors in the same space who are also heads down. They’re divided, in other words, and preoccupied. This approach worked poorly for the Gauls. If history is any guide, it’s not likely to be any more effective for on premise software vendors today facing unified public cloud alternatives.

Join or die.

Disclosure: Amazon, Chef, CoreOS, Docker, MongoDB and NGINX are current RedMonk customers, while Microsoft is not at this time.

Categories: Cloud, Open Source.

OpenStack and the Fragmenting Infrastructure Market

Austin Convention Center

The questions around OpenStack today are different than they were last year, just as they were different the year before that, and the year before that. Go back far enough, and the most common was: has anyone ever successfully installed this? These days, installation and even upgrades are more or less solved problems, particularly if you go the distribution route. Even questions of definition – which of the many individual projects under the OpenStack umbrella are required to actually be considered OpenStack? – have subsided, even if they’re not yet addressed to everyone’s satisfaction.

The real questions around OpenStack today, in fact, have very little to do with OpenStack. Instead, the key considerations for those using the technology or more importantly considering it have to do with the wider market context.

The most significant challenge facing most enterprises today isn’t technology, it’s choice. For every category of infrastructure software that an enterprise can conceive of today, and many that they can’t, there are multiple, credible software offerings that will satisfy their functional requirements. At least one, and likely more than one, of the options will be open source. If you enjoy the creativity of software as an endeavor, this is a golden age.

The problem then is not a lack of technology options, as it might have been ten years ago if a business wanted to, as an example, store information in something other than a relational database. The problem is rather the opposite: there are increasingly too many options to cope with.

Which means that OpenStack, like every other piece of infrastructure technology, is facing more competition. Not in the apples to apples sense, of course. OpenStack outlasted its closest open source functional analogues in CloudStack and Eucalyptus, and the 7500 attendees at this week’s summit would argue that it’s the most visible open source cloud technology today.

But if projects that are exactly equivalent functionally to OpenStack are not emerging at present, technologies with overlapping ambitions are.

Consider containers. By themselves, of course, a container is from an OpenStack perspective little different from a virtual machine – an asset OpenStack was built to manage.

But if you’re making a Venn diagram of technical roles, the vast array of projects that are growing up around containers to orchestrate, schedule and otherwise manage them definitely overlaps with OpenStack today and will more in future. Like other projects that predate the explosive ascent of containers, OpenStack has been required to incorporate them after the fact via projects like Magnum, and according to numbers cited at the Summit 70% of users still want better support for what is, effectively, the new VM. Which OpenStack will ultimately provide, both directly and by serving as the substrate upon which users can run anything from Kubernetes to Mesos.

But what if projects like a Mesos herald a return to our distant past? From the earliest days of computing, the technology industry has been steadily been turning larger machines into larger numbers of smaller machines. The mainframe singular was broken up into mini-computers plural, mini-computers gave way to more machines in the client-server era, and client-server in turn to the scale of virtual machines and eventually clouds. Where we once deployed workloads to a single computer, then, the mainframe, today we deploy them to fleets of machines – so many, in fact, that we require scheduling software to decide which computers a given workload should run on.

What if compute took the next logical step, much as Hadoop has in the data processing space before it, and abstracted the network of machines entirely to present not the fleets of machines we have become accustomed to in this cloudy world, but one big computer – a virtual mainframe? This is, of course, the goal of Mesos and the recently open sourced DC/OS, and while it’s hardly a household name at present, it will be interesting to see whether customers remain fixated on the discrete, familiar asset model that OpenStack represents or whether they are attracted, over time, to the heavy abstraction of something like Mesos. If the web world is any indication, the latter is more probable.

The real problem for OpenStack, however, is that even if users come to believe that OpenStack and container-based infrastructure are not competitive but purely complementary, discovering that fact will take time. Which means that whether container-based infrastructure is or is not technically competition for OpenStack, from a market perspective it will function as such. The reverse is true as well, of course. Container-based infrastructure players continually face questions about whether they require foundational platforms such as an OpenStack (as at Time Warner), or whether users are better off running them on top of bare metal and cutting out the middle man.

All of which again is more of a commentary on the market today than anything OpenStack has or has not done. Like virtually every other infrastructure project today, the primary challenge for OpenStack at present is helping customers make sense of a sea of different technology options. The real danger for OpenStack and its would be competitors and partners, then, is that customers decide to make these choices someone else’s problem by advantaging public infrastructure. For OpenStack, then, and the vendors within its orbit, my message at the Summit was that the most important investments in the near term are not technology, but rather education and messaging.

Projects and the vendors that support them have a tendency to focus on their capabilities, leaving the wider context for users to work through on their own. In general, this is an unfortunate practice, but with the landscape as fragmented as it is today, it is a potentially fatal approach. If you don’t make things simple for users, they’ll find someone who will.

Categories: Cloud, Containers, Hardware-as-a-Service, Open Source.

Meet the New Monk: Rachel Stephens

Out of the thousand or so words that went into the job posting for our open analyst position, arguably the most important were eleven that were added by James. Under a heading entitled “The Role,” he wrote “We’re looking for someone that will make the role their own.” Some, maybe even most, of the candidates we interviewed asked about this, what it meant.

The short answer is that while the best candidates for us are those that prove to us that they can do this job, the ones that separate themselves are those that can do this job and do jobs that we’re not able to. We wanted our new analyst to bring something new to the table, not just skills we already have.

It would be easier, certainly, to simply filter our applicants to those that have prior analyst experience: we wouldn’t have to spend any time training new hires to be analysts. But as an analyst firm that sees the world through a very different lens than that employed by other firms in our industry we have to be more creative, which is why we deliberately cast as wide a net as possible. As has become typical when we hire, our inbox was full of resumes from all different educational and experiential backgrounds. DOD analysts. Major league baseball back office personnel. Nuclear technicians. Lawyers. Actuaries. Accountants. Professors. Developers.

And then there was this financial professional. Who’d been turned into a DBA. Who’d been trained in BI. Who was in the process of completing her MBA. Who knew how to use GitHub better than we did. Who had organized conferences. Who read through three years of the quarterly earnings transcripts from one of our clients as part of her application for the position, teaching us things we didn’t already know. Who was once offered a job on a flight by another of our clients. And most importantly, who demonstrated that she both understood and believed in what we do at RedMonk.

Hiring decisions are never easy, because serious applicants demand serious consideration, but the right candidates can make the decisions easier. When James and I were talking through the final set of candidates, we found ourselves getting excited talking about the different opportunities and potential projects for this one candidate in particular, and – remarkably diverse and capable set of candidates or no – our decision became clear.

Meet Rachel Stephens, the newest RedMonk analyst.

If you’ve been to the Monktoberfest, you have probably met Rachel already, as she has never missed one. She helped organize last year’s version, in fact. What you may or may not know about Rachel is that she has not just the skills to manipulate data but a true passion for using it to ask questions. Anyone who has opinions – strong opinions – on the differences in keyboard shortcuts between the Mac and Windows versions of Excel was likely to get our attention; the fact that Rachel also works comfortably in other tools from R to Tableau is gravy. As are contributions like her updates to Nathan Yau’s 2011 wunderground-scraping script from Visualize This (which I wish I’d seen before I blew an hour on that myself).

But apart from her hard core research skills and overpowering curiosity – attributes that we obviously prize highly – Rachel will also add an important new element to our work. With a long background in finance and with her MBA weeks away from being complete, one of the things Rachel will bring to RedMonk is a professional’s understanding of the balance sheet. This is an important skill in general, but it’s particularly relevant in emerging markets like cloud or Software-as-a-Service, where multi-billion dollar businesses are buried in filings under categories marked “Other.”

So whether she’s hard at work comparing, say, the relative trajectories of JavaScript frameworks based on Stack Overflow data or taking apart SEC filings for us, Rachel is a candidate that we know will make this role her own. We’re thrilled to have her on board.

While Rachel is based out of Denver, you can expect to see her around at the usual conferences in the Bay Area, Boston, Las Vegas, New York and so on. We’ve told her what a great group of people we at RedMonk know and work with, and she’s excited to meet all of you. Those of you who know Rachel and the quality of person she is know what you have to look forward to. For those of you who haven’t spoken with her yet, you’re going to enjoy working with her, I guarantee it.

Rachel’s first day with RedMonk will be May 23rd, but until then you should feel free to go say hello over on Twitter at @rstephensme.

Categories: People, RedMonk Miscellaneous.

Someone Else’s Problem

The above statement is exactly correct. The idea that serverless literally means no servers is no more accurate than the argument that does not sell software. There are servers behind serverless offerings such as AWS Lambda just as there is software behind If you want to get pedantic, you might argue that both of these statements are lies. While the pedant may be technically correct, however, a larger and more important truth is obscured.

Consider the challenge facing those who wish to compare private cloud to public cloud, as but one example. Even if you can counterfactually assume relative feature parity between private and public offerings, it remains a comparison of two entirely distinct product sets. The fact that they happen to attack the same functional area – base cloud infrastructure – should not be taken to mean that they can or should be directly compared. Private cloud solutions are about using combinations of software and hardware to replicate the most attractive features of public cloud infrastructure: dynamic provisioning, elasticity, and so on. Public cloud offerings, like any other as-a-service business, are as much about assuming the burden of someone else’s problem as they are the underlying hardware or software.

Which is why it’s interesting that IaaS, PaaS and SaaS providers don’t emphasize this distinction for the most part.

To some extent, this is logical, because it’s an inherently IT-unfriendly message – if a buyers’s problems are made a seller’s problems, it follows that some people from the buyer side are no longer necessary – and making unnecessary enemies is rarely a profitable strategy. It’s quite evident, however, that, while perhaps not unanimously, buyers are putting an increasing emphasis on making their problem someone else’s problem.

As they should. Because many of the problems that have traditionally been the province of the enterprise, shouldn’t be. In a world in which solving technology problems means upfront material capital expenditures on hardware and software, having high quality resources that can address those problems efficiently is important, if not differentiating. In the post-cloud context, however, this is far less important, because you can select between multiple providers in any given category that are likely to be able to provide a given service with greater uptime and a higher level of security than you can. There’s a reason, for example, that the two fastest growing services in the history of AWS are Redshift and Aurora: database and warehousing infrastructure is as expensive to maintain as it is tedious. Or put differently, what is the business value of having in-house the skills necessary to keep complex and scalable database infrastructure up and running? Is the value greater or less than the premium you’ll pay to third parties such as an Amazon, Google, Heroku, IBM or Microsoft to maintain it for you?

At which point the question becomes not whether this is literally server or software-less, but rather whether or not you want servers and software to be your problem or someone else’s. Increasingly, the market is favoring the latter, which is one reason the commercial value of on premise software is in decline while service based alternatives are seeing rapid growth. It is also why *aaS providers should be explicitly and heavily emphasizing their greatest value to customers: the ability to take on someone else’s problems.

Disclosure: AWS, IBM, Salesforce (Heroku) are RedMonk customers, Google and Microsoft are not currently customers.

Categories: Cloud, Hardware-as-a-Service, Platform-as-a-Service, Services, Software-as-a-Service.

Yes, It is Harder to Monetize Open Source. So?

Crash Collapse

Four years ago this month, Red Hat became the first pure play commercial open source vendor to cross the billion dollar revenue mark – beating my back of the envelope forecast in the process. This was rightfully greeted with much fanfare at the time, given that if you go back far enough, a great many people in the industry thought that open source could never be commercialized. Enthusiasm amongst open source advocates was necessarily tempered, however, by the realization that Red Hat was, in the financial sense, an outlier. There were no more Red Hat’s looming, no other pure play commercial open source vendors poised to follow the open source pioneer across the billion dollar finish line.

Four years later, there still aren’t. Looking around the industry, Red Hat remains the sole example of a pure play open source organization matching the revenue generated by even modest-sized proprietary alternatives, and as was the case four years ago, there are no obvious candidates to replicate Red Hat’s particular success.

Which has, understandably, led to assertions that – in the non-literal sense – open source can’t make money and is difficult to build a business around. Assertions for which there are exceptions like Red Hat, but that are generally defensible based on the available facts.

What these discussions typically omit, however, is that – as we’re reminded by by Adrian Cockcroft – it’s also getting harder to make money from proprietary software. As has been covered in this space for years (for example), and in book form in The Software Paradox, sales of software generally have been on a downward trajectory over the past decade or more. Notably, this is true across software categories, consumer to enterprise. From large software providers such as IBM, Microsoft or Oracle seeing systemic declines in software margins, revenue or both to consumer companies like Apple taking the price of their operating system from just under $200 to zero, the simple fact is that it’s getting more difficult to monetize software as a standalone asset.

It’s far from impossible, obviously: Microsoft’s revenue stream from software as but one example is measured in billions in units of ten. But when you look across industries, at company after company, the overall trendline is clear: it’s harder to make money from software than it used to be – regardless of whether the model employed is volume or margin, open or closed. Smart companies realize this, and are already hedging themselves against these declines with alternative revenue models. There is a reason why we’re having a lot more Software Paradox-related conversations with our clients today than we would have even a few years ago: the writing is on the wall.

So yes, we are no more likely to see another Red Hat today than we were four years ago. But that says a lot less about the merits of open source as a model than it does about commercial valuations of software in general.

Categories: Business Models, Open Source.

Ubuntu and ZFS: Possibly Illegal, Definitely Exciting

The project originally known as the Zettabyte File System was born the same year that Windows XP began shipping. Conceived and originally written by Bill Moore, Jeff Bonwick and Matthew Ahrens among others, it was a true next generation project – designed for needs that could not be imagined at the time. It was a filesystem built for the future.

Fifteen years later, it’s the future. Though it’s a teenager now, ZFS’s features remain attractive enough that Canonical – the company behind the Ubuntu distribution – wants to ship ZFS as a default. Which wouldn’t seem terribly controversial as it’s an open source project, except for the issue of its licensing.

Questions about open source licensing, once common, have thankfully subsided in recent years as projects have tended to coalesce around standard, understood models – project (e.g. GPL), file (e.g. MPL) or permissive (e.g. Apache). The steady rise in share of the latter category has further throttled licensing controversy, as permissive licenses impose few if any restrictions on the consumption of open source, so potential complications are minimized.

ZFS, and the original OpenSolaris codebase it was included with, were not permissively licensed, however. When Sun made its Solaris codebase available for the first time in 2005, it was offered under the CDDL (Common Development and Distribution License), an MPL (Mozilla Public License) derivative previously written by Sun and later approved by the OSI. Why this license was selected for Solaris remains a matter of some debate, but one of the plausible explanations centered around questions of compatibility with the GPL – or lackthereof.

At the time of its release, and indeed still to this day as examples like ZFS suggest, Solaris was technically differentiated from the far more popular Linux, offering features that were unavailable on operating system alternatives. For this reason, the theory went, Sun chose the CDDL at least in part to avoid its operating system being strip-mined, with its best features poached and ported to Linux specifically.

Whether this was actually the intent or whether the license was selected entirely on its merits, the perceived incompatibility between the licenses (verbal permission from Sun’s CEO notwithstanding) – along with healthy doses of antagonism and NIH between the communities – kept Solaris’ most distinctive features out of Linux codebases. There were experimental ports in the early days, and the quality of these has progressed over the years and been made available as on-demand packages, but no major Linux distributions have ever shipped CDDL-licensed features by default.

That may change soon, however. In February, Canonical announced its intent to include ZFS in its next Long Term Support version, 16.04. This prompted a wide range of reactions.

Many Linux users, who have eyed ZFS’ distinctive featureset with envy, were excited by the prospect of having official, theoretically legitimate access to the technology in a mainstream distribution. Even some of the original Solaris authors were enthusiastic about the move. Observers with an interest in licensing issues, however, were left with questions, principally: aren’t these two licenses incompatible? That had, after all, been the prevailing assumption for over a decade.

The answer is, perhaps unsurprisingly, not clear. Canonical, for its part, was unequivocal, saying:

We at Canonical have conducted a legal review, including discussion with the industry’s leading software freedom legal counsel, of the licenses that apply to the Linux kernel and to ZFS.

And in doing so, we have concluded that we are acting within the rights granted and in compliance with their terms of both of those licenses. Others have independently achieved the same conclusion.

The Software Freedom Conservancy, for its part, was equally straightforward:

We are sympathetic to Canonical’s frustration in this desire to easily support more features for their users. However, as set out below, we have concluded that their distribution of zfs.ko violates the GPL.

If those contradictory opinions weren’t confusing enough, the Software Freedom Law Center’s position is dependent on a specific interpretation of the intent of the GPL:

Canonical, in its Ubuntu distribution, has chosen to provide kernel and module binaries bundling ZFS with the kernel, while providing the source tree in full, with the relevant ZFS filesystem code in files licensed as required by CDDL.

If there exists a consensus among the licensing copyright holders to prefer the literal meaning to the equity of the license, the copyright holders can, at their discretion, object to the distribution of such combinations

The one thing that seems certain here, then, is that very little is certain about Canonical’s decision to ship ZFS by default.

The evidence suggests that Canonical either believes its legal position is defensible, that none of the actors would be interested or willing to pursue litigation on the matter, or both. As stated elsewhere, this is if nothing else a testament to the quality of the original ZFS engineering. The fact that on evidence, Canonical perceives the benefits to outweigh the potential overhead of this fifteen year old technology is remarkable.

But if there are questions for Canonical, there are for their users as well. Not about the technology, for the most part: it has withstood impressive amounts of technical scrutiny, and remains in demand. But as much as it would be nice for questions of its licensing to give way before its attractive features, it will be surprising if conservative enterprises consider Ubuntu ZFS a viable option.

If ZFS were a technology less fundamental than a filesystem, reactions might be less binary. As valuable as DTrace is, for example, it is optional for a system in a way that a filesystem is not. With technology like filesystems or databases, however, enterprises will build the risk of having to migrate into their estimates of support costs, making it problematic economically. Even if we assume the legal risks to end users of the ZFS version distributed with Ubuntu to be negligible, concerns about support will persist.

According to the SFLC, for example, the remedy for an objection from “licensing copyright holders” would be for distributors to “cease distributing such combinations.” End users could certainly roll their own versions of the distribution including ZFS, and Canonical would not be under legal restriction from supporting the software, but it’s difficult to imagine conservative buyers being willing to invest long term in a platform that their support vendor may not legally distribute. Oracle could, as has been pointed out, remove the uncertainty surrounding ZFS by relicensing the asset, but the chances of this occurring are near zero.

The uncertainty around the legality of shipping ZFS notwithstanding, this announcement is likely to be a net win for both Canonical and Ubuntu. If we assume that the SFLC’s analysis is correct, the company’s economic downside is relatively limited as long as it complies promptly to objections from copyright holders. Even in such a scenario, meanwhile, developers are reminded at least that ZFS is an available option for the distribution, regardless of whether the distribution’s sponsor is able to provide it directly. It’s also worth noting that the majority of Ubuntu in usage today is commercially unsupported, and therefore unlikely to be particularly concerned with questions of commercial support. If you browse various developer threads on the ZFS announcement, in fact, you’ll find notable developers from high profile web properties who are already using Ubuntu and ZFS in production.

Providing developers with interesting and innovative tools – which most certainly describes ZFS – is in general an approach we recommend. While this announcement is not without its share of controversy, then, and may not be significant ultimately in the commercial sense, it’s exciting news for a lot of developers. As one developer put it in a Slack message to me, “i’d really like native zfs.”

One way or another, they’ll be getting it soon.

Categories: Open Source, Operating Systems.