Skip to content

The Future of Open Source

Last week in New York, the venture firm Accel held a ninety minute lunch for an audience of financial analysts and equity professionals, a reporter or two and at least a few industry analysts. The ostensible subject for the event was Accel’s Open Adoption Software (OAS) model, but the wider focus was what the future held for open source in general. Accel’s view on this subject, as well as that of the panelists at the event from Cloudera, Cockroach Labs and Sysdig, was that open source essentially has gone mainstream. As Jake Flomenberg, an Accel partner put it, “There is a massive shift going on in the ways technology is bought. Open source has gone from the exception to the rule.”

Setting the OAS model to the side for the time being, the larger message that open source has become the default choice in a wide array of infrastructure software categories isn’t difficult to sell. It is a message that was consistent with attendee sentiment at the most recent OSCON in Austin, the ApacheCon in Vancouver, and the Linux Foundation’s Collaboration Summit in Lake Tahoe before that. It is a message that Cloudera’s Mike Olson would presumably agree with; in 2013 in a piece entitled “The Cloudera Model,” he said simply, “you can no longer win with a closed-source platform.”

The idea that open source has effectively won within the enterprise is also consistent with RedMonk’s own views on open source. The only real difference is that this perspective, for my part, is better used to describe open source’s past than its future.

At the 2005 O’Reilly Open Source Conference, for example, I gave a keynote to the room full of developers entitled “so you took over the enterprise: what now?” Open source was not yet as common within the enterprise eleven years ago as it is today, but from our vantage point it had passed the tipping point, and its trajectory was assured. The decade since giving that presentation has done nothing but validate that original assertion, as open source projects and efforts to commercialize same have entered market after market, category after category, to the point where there are frequently more open source options available today than proprietary alternatives.

All of which has led to open source advocates taking a deserved bow for their success. It was by no means assured; certainly in the early days of RedMonk, there was major skepticism about the model in general, from its ability to sustain itself commercially to its vulnerability to everything from intellectual property violations to security exploits.

But just as open source is finally being recognized as the viable model we always believed it to be, it is facing competition that enjoys some of the same advantages over open source that open source had relative to proprietary software.

That competition is the cloud.

Competition is an interesting term to use, to be sure, because the cloud is built for the most part from open source software, and the cloud is such an important channel that it has elevated open source projects such as Ubuntu to first class citizen status. The presentation that Accel gave didn’t mention the cloud as a competitive threat, and the competition most frequently discussed by both Accel and its open source participants was proprietary software companies.

But if we step back, public market activity suggests that more concern is warranted. We know, for example, that Oracle, one of the standard bearers for proprietary software, derives less of its revenue from the sale of new software licenses every year. We also know that Amazon Web Services, which conflates open source, proprietary software and hardware, is growing quickly. Correlation may not prove causation, but it’s difficult to build the case that these two facts are unrelated.

Consider some of the differences in user experience when comparing cloud services to traditional on premise open source alternatives.

  • Convenience:
    Open source was used over proprietary software in many cases not because it was functionally superior or even because the source was available, but simply because it was easier to obtain. To get a closed source product, best case you needed to fill out long, involved registration forms; worst case you needed to talk to a salesperson and find budget. With open source, you simply downloaded what you needed and were on your way. What could be simpler? How about not downloading anything, but standing up a given piece of software already combined with hardware in seconds. Where open source once held the title of most convenient, it has long since ceded that to the cloud.

  • Complexity:
    Open source has long prided itself on representing choice: in any given category of software, users are free to pick from multiple, credible open source implementations. But that choice is an overhead, overhead that is multipled with each additional choice a user has to make. By comparison, cloud platforms typically have a default service available in a given category: one monitoring tool, one container engine, one storage array, one CDN and so on. Users that require more control have the ability to run the software of their choice on infrastructure they maintain, of course, but also can follow the path of least resistance and simply accept the default – which they don’t have to run.

    As an aside, this problem is one reason foundations like Cloud Foundry or the Cloud Native Computing Foundation are interesting, given that their focus is integrating disparate parts and projects.

  • Operations:
    Because many commercial open source organizations were built to compete with proprietary alternatives, convergent evolution has led them to look and behave in similar ways. In many cases, for example, the burden of getting a given open source offering stood up and integrated is left to a customer, or their high priced systems integrator of choice. Relatively few commercial open source organizations have services capabilities extensive enough to assist with more than initial configuration and setup: integration is an exercise left to the buyer. While cloud services are not a panacea, many of their services are by comparison more easily integrated with one another than independent on premise open source alternatives.

    Just as with proprietary software, cloud services can be sold against open source alternatives on a CAPEX vs OPEX basis; rather than pay up front for support and service, for example, these expenses can be born instead over time in the form of premiums above a base infrastructure cost. This may result in higher costs over time, of course, but the ability to amortize payments over time can be useful to cash-constrained business units or startups.

  • Data:
    While vendors both cloud and on-premise have been reluctant to invest in and market, at least overtly, telemetry and data oriented models, it is inevitable that they will in my view. If this should come to pass, it’s another advantage for cloud over on premise open source, because collecting data from datacenters you maintain is far less complicated than retrieving it from individual user facilities.

It’s important to differentiate, of course, between the outlook for open source and commercial open source. The prospects for the former remain reasonable, as it remains a fundamentally more viable methodology for most classes of software. The rise of the cloud may even accelerate the availability of certain classes of open source software, either because its authors are not in the business of selling software (e.g. Facebook/Cassandra or Twitter/Heron) or because commercial vendors seek to reduce customer fear of lock-in via public cloud implementations of OSS (e.g. Cloud Foundry, MySQL).

For commercial open source vendors, however, it is important to recognize that the cloud is at least as much threat as opportunity. Many of our commercial open source customers, whose primary business is selling to enterprises, have acknowledged this, and are ramping into the public cloud as quickly as they can. This rapid flight is creating strange bedfellows, in fact: several open source vendors admitted at OSCON that the best cloud partner in their experience was Microsoft – an interesting turn of events for someone who remembers that Jason Matusow needed security for his first appearance at the conference, so hated was the vendor at the time.

Does the cloud represent opportunity for open source as well? Undoubtedly. But the future outlook for open source, particularly those who would commercialize it, seems counterintuitively far more murky today than it was in 2005. Open source is rightly being heralded as the default, and having “won.” The difficulty is that in this industry, victories tend to be very short lived.

Open source has learned how to compete, and compete very effectively, with closed source. That’s its past. Its future will be competing with the public cloud, and the first step towards doing that effectively is admitting the problem.

Categories: Cloud, Open Source.

Hark Episode 1, “Election Night”: Guest, Jeremy Bowers

As newsletter subscribers are aware, Episode 1 of my new podcast Hark, “Election Night,” is here and ready for your listening pleasure. In this month’s episode, Jeremy Bowers, a developer on the New York Times Interactive News desk (which is hiring, incidentally), takes us inside the Times’ newsroom on Election Night. From the technical stack to the schedule that night to what the favorite catering choices are, Jeremy provides a behind the scenes look into what life is like for the developers on the other side of the website you try to bring to its knees by hitting refreshing over and over and over. And possibly swearing at. Jeremy and I also talk programming-as-journalism, AWS, WebSockets, how to get into elections as a technologist and more, so give it a listen.

For those of you who don’t do podcasts, however, we offer a transcription of the episode below. Enjoy.

Stephen: Well, excellent. Jeremy Bowers, welcome to Hark. If it’s okay with you, I’d like to start with a couple of basics. Who are you, and what do you do?

Jeremy: Sure. It’s really good to be here. My name is Jeremy Bowers, and I work for the interactive news team at The New York Times. We’re a weird little collective of programmers that sit in the newsroom and work really closely with reporters and editors on projects that sort of fall in the gaps between our graphics desk that does lots of charts and maps and the paper’s standard IT staff that would build a lot of platforms and our CMS, for example.

There’s a lot of these sort of Big Data projects, Olympics, elections, World Cup, that sort of fall in between the skill sets of these other groups. That’s where our team comes in. We jokingly refer to ourselves as Navy SEALS. Someone’s got to fix these things. Someone’s got to make them work, and that’s what we do.

Stephen: Nice. With getting into political news, elections seem to be something of a specialty for you. Was that a conscious plan, or is that something you kind of fell into? If I go through your background, beginning at the St. Petersburg Times to NPR and now with The Times, from the outside looking in, it seems as if your career followed a plan. Is that an accurate statement?

Jeremy: No, not at all. The best part about it is it’s a demonstration of just how the incentives for a lot of newsroom programming teams work. Elections are like a particularly perfect case where it’s just a little too much data for most people’s graphics desks to handle all alone. Though, honestly, The Times’ graphic desk could probably do this in a heartbeat. They just need a janitor to help them clean up. It’s one of those projects where, because if you have programming skills and a little bit of journalism sensibility, you will end up working on that project everywhere that you work.

This will be my fourth news organization which I’ve worked on a general election. I did one at the St. Pete Times, one at the Washington Post, one at NPR, and then 2016 here.

Stephen: Nice. So the one at St. Pete, was that the first election you worked on?

Jeremy: It was, 2008, with a young gentleman named Matthew Waite. I don’t remember very much about it. I was working on it like part time, because I was also working on blogs. My first job was as a blog administrator writing PHP and Perl templates for trashy old Movable Type and pre-WordPress blogs. It was not good. There was a better world out there. There was a world in which we were pulling down CSVs and running Python scripts against them. It just looked so cool, and I really wanted to do that pretty badly.

Stephen: One of the things that comes up in your background as we look through it is that you’ve sort of been immersed in blending data-driven analysis and news coverage. The idea of programming is journalism has trended in recent years. What’s your take on that? Programming is journalism, is that a real thing? Where do you think we are in that process?

Jeremy: There’s really two parallel thoughts that I have on this. The first thought is that there’s definitely some beats where if you’re not a programmer, you’re going to have difficulty keeping up with places that have a programmer helping you cover that beat. A good example of this would be campaign finance. We’ve had campaign finance laws since Watergate, which forced PACs and campaign committees to release their data either quarterly or every month. But if you were a reporter working on campaign finance in the ’80s or ’90s, you had days to write your story.

You could just leaf through the filings. It would take you really a long time, and you may not be able to find really useful stories, because people are really bad at finding small changes in data over time. Computers are super good at that.

Campaign finance is one of these examples where if you are a programmer, you can write scripts that will do aggregates, that will do comparisons, and it will let you write a story that’s just better than the story that you would have written before. We’re not even talking about something like a chart or a graph that’s a nice visual presentation. That’s just the standard story that you are going to write. You can just write that story better if you have programming skills.

That’s one part, is beats that require a programmer. Then, there are other things where it’s an older beat, but programmers could make it better. I like to think about things like police reporting or places where there aren’t data sets already, things like drone strikes or Guantanamo detainees, things where having a more structured understanding of how the world works, accumulating data sets that maybe don’t already exist can be a form of reporting in their own right. In particular, I really enjoy those.

Our team at The Times maintains this database of people who’ve been detained at Guantanamo, and I just don’t know of anything else that’s quite like it. It’s a fascinating data set and a really neat project. It only exists because someone bothered to sit down. Marco Williams and a team of interactive news developers sat down and decided to start tracking this and make a website out of it.

Stephen: That’s interesting. Certainly as someone in the industry, I’ve always found this fascinating in going back to some of the early days with sites like by Adrian Holovaty and basically taking raw crime dumps, putting them on a map, making them interactive, and making them useful in ways that we haven’t seen before.

I’m curious, though, because you hear different things about the traction behind sites like FiveThirtyEight in terms of trying to cover things other than elections. As somebody who does this work, do you think it’s something that’s more outlet driven, or do you think it’s something that is in demand from the audience? Is this something that is actually in…is it a pull thing, or is it a push thing, do you think?

Jeremy: Yeah, that’s actually a really good question. My personal bias on this feeling is that people aren’t in it for any particular form or presentation, but they’re in it for a story or some information, right? Our audience has always wanted to know things about the election, and they would love to have a better take on it. If the better take happens to be more visual or if we can tell them a story that we only can tell well in audio, then we probably ought to tell it on audio. If we have a story that comes out better as a chart, then we probably ought to tell them that story as a chart.

The thing I think that we miss if we don’t do data journalism particularly well is that we miss out on the stories that are told better when they’re told less anecdotally and with more rigor. That doesn’t mean that every story that we tell ought to be that way. There are many stories that are just much better as anecdotes. Some stories are good as a good mix. I really am fascinated at trying to find places where there’s an intersection of good story and anecdote but also lots of good structured data.

Stephen: Yeah, it’s interesting, because it’s one of those things that I…for example, with baseball. I’m a big baseball fan. I consume a lot of the analytical content and so on. The interesting thing, though, is that some of the writers, I think, get too far removed from the actual reader. Even as somebody of an analytical bent and somebody who’s technical, when you begin just throwing up charts because you have charts, you kind of lose the plot at times. Is that something you guys worry about? How much input do you have from the editorial side in terms of trying to make sure, “Hey, look. Let’s not just throw numbers up. Let’s try to make sure this is a story”?

Jeremy: Entirely. This is sort of the problem, right? Is that it’s so easy if you’re doing data journalism to stumble into a pile of data and then say, “I will put this up on the Internet for someone to explore.” The truth is, that’s the reporter’s job, is to explore and find a story and then to tell the story, not just to put it up and let people struggle through it.

The other thing that strikes me about that is that I don’t think there’s a competition between anecdotal and structured story telling. There are really great stories, like listening to Vin Scully talk about baseball. I don’t want to hear Vin Scully talk about someone’s wins above replacement. I’m sure he would be great, but he has a strength, and he should stick to his strength.

There are other people like Harry Pavlidis working on whatever random model he’s working on. I love to read about that too, but all of it should tell me a story. That stuff about catcher framing I feel like it’s some of the best analysis that I’ve seen lately, because it basically told you more about baseball. You learned all the stuff that you already sort of suspected, right, is that catcher is really valuable. Game calling is valuable. Turning balls into strikes is valuable.

But there’s something more than just that. It’s being quantitative about it, being able to say, “This is how much more valuable that player is than they used to be.” It opens up a ton of room for a standard reporter to just go out and ask a bunch of questions. Nobody was going to ask Matt LeCroy about how he framed pitches beforehand, because no one realized that it was super important, you know?

Stephen: Yeah, yeah. All right, let’s get back to The New York Times then.

Jeremy: Yes.

Stephen: When you look at elections, what is the agenda? How do you drive evidence or facts based reporting into coverage of an election? Where do you get the raw data? How does the process work?

Jeremy: Yeah, absolutely. We get our real time data from the Associated Press, who maintain a huge stack of reporters and stringers who go to precincts all around the country and will report back data that they are watching officials hand enter. As a result, we get the data from the AP much more quickly than we would get it from the Secretary of State for that state, which is where the ultimate election results end up. But the real election results don’t show up for several months. The primaries that we’re watching right now, those are largely unofficial. They’re a project of the state’s party.

So the AP really becomes the de facto standard for a lot of this data especially in real time. We pay them for access to that data. Up until 2014, that data was provided through a FTP dump, a series of oddly delimited files in an FTP folder that you had to know how to find the correct file and load it up. They would update it every couple of minutes, every five to seven minutes or so. You could hit it once every minute looking for an update.

Well, in 2014 the AP released an HTTP API. For the 2014 general election, we didn’t use it. But for 2016, we decided we wanted to, because it’s a little faster. It gets the data to us as soon as three minutes, and race calls are instantaneous. An AP editor will hit a button to call a race for Bernie Sanders, and it will almost immediately show up in the API. So we want that speed more than we want almost anything.

That meant that we had to rebuild our election rig this year. It’s an old app, actually I think the oldest continuously updated application in our portfolio. It’s a 2006 era Ruby on Rails app that is not modular. It’s a very large app that parses all the data, loads it into a database, does sanity checks, does difference calculation, bakes out HTML and JSON of all the pages. I think it was like 200 total files/URLs for every election, which is a lot of things for a single application to do.

This year, we decided that we were going to break that down into a series of more modular little pieces, which was very exciting to make a big change like that in advance of such a big election cycle. We decided that that was really important. It would also give us a chance to rewrite it in Python and some old magic style Bash scripts, make it a little easier for us to maintain, and make it a lot easier for other people on our team to be involved in it as well.

Stephen: Yeah. That’s a great segue. You mentioned Python. You mentioned Bash. What are the technologies The Times uses? What’s the stack look like?

Jeremy: Yeah, absolutely. My little team, we have some Rails developers and some Python developers on the back end. We write a ton of Bash. We write some Node on the server depending on the project. We have a ton of JavaScript on the front end.

This year we’ve decided that long polling for results is sort of an anti-pattern. It’s slow, especially on mobile devices, to have to pull down a ton of JSON every 30 seconds. This year, we’ve been using WebSockets. We’ll open a WebSocket connection when the client connects. He’ll get the initial data from the page, then we can push you fractional updates whenever an update comes through. Because you’re not polling, the browser doesn’t have to do nearly as much. We don’t rewrite the DOM as often. The result is it feels a lot better. A user can scroll up and down, and that doesn’t feel laggy. The updates are really small, so even over a slow connection they work pretty well.

Stephen: Do you have issues with a browser in terms of their relative support levels for WebSockets?

Jeremy: Oh my, yes, although, truthfully, most…We did not use WebSockets in 2014 for this reason. We tried. Asterisk: we wrote a library that will fall back to long polling in the event that your browser doesn’t support WebSockets. As a result, it’ll just be a little slower, a little laggier. In 2014, I think we defaulted to long polling in a lot of cases. This year, almost an overwhelming number of the clients that have visited us have been using the Sockets. It’s better for everybody that way. It’s better for us, because we’re moving less data over the wire. It’s better for the client. That’s actually been a really good thing this year.

Stephen: Okay. Let’s fast forward to election night. This could be primaries or general. Without getting into sort of raw numbers, which I’m sure The Times would prefer that you keep to yourself, what kinds of spikes in traffic do you expect to see? In other words, versus a normal day. Is it 2X, is it 10X? What are you looking at in terms of the relative demand?

Jeremy: I can tell you that on an election night our results page is like having a second home page. We’re getting as much traffic to that results page on average as our home page does even on that night, which is already a multiple of a normal night.

I can tell you a pair of anecdotes that I think you’ll find amusing. The general election in 2012 set traffic records that we didn’t break until 2014. One of the things that happened in 2012 is that we, as a result of a cache misconfiguration, pointed the fire hose of the home page, plus all the people looking at our results pages at an unwarmed Amazon elastic load balancer, which immediately crumpled under the load. It was the first time I’d ever heard of that happening. I didn’t even know that that was something that could happen.

Stephen: That’s funny.

Jeremy: This year, we got a phone call and an email from Amazon, because we had done a very similar thing. We’d pointed about 14,000 requests a second at an Amazon S3 bucket that had not previously taken any traffic. As a result, we were returning about 1 in 5, 1 in 6 pages were a 500, something I’d never seen before. So we got a nice phone call from them about that as well.

Stephen: There you go.

Jeremy: So we’ve gotten to know our Amazon service reps, so it’s been nice.

Stephen: I was going to say. Amazon must be crucial just in terms of being able to spin up and spin down, depending on the load.

Jeremy: Yeah. There’s actually a handful of confounding factors about a general election that make programming against it a little difficult. We have geographic issues going on, right? We can’t not serve election results, just because there’s an outage in U.S. East 1. So we have to maintain duplicate versions of our whole infrastructure, our whole stack in East and in a West zone, availability zone. We have some issues with scale, which I kind of alluded to. It’s just thousands and thousands and thousands of requests per second on just a normal primary night. For the general, we’re pretty much expecting to set traffic records that we may not break for four more years.

There are a large number of contributors to our software. I’m working on sort of the core election stuff. But we also have three or four developers on our graphics desk who are working on a Node app that handles all the HTML and maps and stuff like that. We have two or three other coworkers of mine who are working on sort of back end pieces, and then a handful like site engineers. It’s like when you’ve got 10 or 12 people all contributing code to the same thing. That’s a confounding factor that you almost never run into on a normal project, especially a small one like this.

One thing that’s particularly hard about the election like this is that we have the staff like The Upshot, or we have other smart people inside the company who would like to do something new this year, or they want to do something they haven’t done before. A great example of this would be those Upshot live models. In previous years, we would have had to write special software to get them a route that would produce the data that they need. Then we would have had to help them scale their app up. It really would have been very difficult to do what we’re doing this year in a previous year.

Because of the way we set this up, very modularly, The Upshot has access to all the same election data that everybody else has. So they can test. They have access data to do tests on. As a result, they can build these live models, they can deploy them, and it just runs. No one has any questions. It makes it a lot easier to do things that you might say are “innovative” or things that are just new, things that are different to us and that normally we would have had to put a lot of development time into.

Stephen: Yeah, indeed. In terms of preparing for a major event, whether it’s Super Tuesday or a general or what have you, what are the kinds of things that you preemptively do, either on the people side or the hardware side? Obviously, you pre-warm hardware. Do you work at shifts? What can you tell me about the process rolling up to an election?

Jeremy: The one really great thing that I enjoy is that, because it’s a newspaper, we’re already people who are used to the idea that we’re all hands on deck for breaking news. We put together a little war room of the core folks that need to be around.

A great example, of this would have been on Super Tuesday. Big election night, so we have two or three of the core committers from the graphics desk who work on the maps and who built the charts sitting in the room. We’ve got me, who handles the election data in the room. We’ve got two editors in there. We have a politics editor available to make race calls as necessary. A Masthead editor occasionally drops by. It’s nice to have everybody in one place. That actually solves most of the problems that we have.

Stephen: Interesting. Okay.

Jeremy: I’m sure it won’t shock you that most of the problems that we have are not technological in nature, but human, right?

Stephen: Yeah, of course.

Jeremy: We’ll often have this case where something weird will happen, or it’ll look weird, but it’s actually completely legitimate, like the AP may call a race, and then we may not have any votes that come in for 30 minutes. It looks weird, but it’s totally legitimate. The AP can call. They have models for race calls that involve things like exit polls, votes that they have access to that haven’t been assigned to precincts. They can know in advance with some regularity that this is going to be a race that X candidate wins.

So they’ll call it, but we won’t have any votes in the system for a good 20 or 30 minutes while the secretaries of state prepare to deliver the first batch. It’s nice to have everybody in the same room so we can calm everyone down and let folks know this is totally legit, nothing’s broken, and this is correct. So it’s good. Other than that, we get a lot of sleep the day before.

Stephen: That’s right, yeah.

Jeremy: Try not to deploy anything new. We test like hell beforehand. That’s one thing I can say I really enjoy. My editor’s given me months in advance of these primaries to write software and then write tests and then write an entire rig for simulating an election that we can run our software as if we were in the middle of an election night. We went so far as to build in something that will simulate 500s or 403s from the AP, like that they’re down or that we’ve run out of API requests.

Stephen: Okay, so kind of like a chaos monkey for The Times?

Jeremy: You got it, exactly. Because we need to know that if something really horrible happens, we’ll be able to stay up and continue providing results.

Stephen: Right. What is a typical shift then? In other words, when do you come in that day? What time do people leave?

Jeremy: On a normal election night, I’ll usually take a train up to New York the day before and work like a half day on a Monday. Those elections are usually Tuesdays, the big ones. Tuesday morning, I’ll get in around 10:00 or so. The AP will push what they call live zeros around 11:00. This is to say they’ll have all the reporting units. This is the geographical areas that they expect to have results from in and with zeros as their vote totals. This lets us warm up the database with all the candidate information, all the names of all the counties that we expect to see, and it gives us an opportunity to go and edit names. Times style would be Donald J. Trump instead of Donald Trump, for example. So we have a handful of overrides like that that we need to do.

Between 11:00 and noon, we’re in loading that initialization data and baking out our first result pages that are all empty. Then we basically go eat lunch, and then all get back together about 5:00. First results usually start coming in somewhere between 6:00 p.m. and 9:00 p.m. Then it’s basically just all hands on deck until about 1:00 a.m. when the last handful of results come in, sometimes even later, if it’s like Hawaii or Alaska. But yeah, that’s what the night looks like.

Really, the big push is for the next morning when we have to write the day after story. We’ll have a politics editor asking us what county did Donald Trump perform the best in Massachusetts or where did Hillary outperform her 2008 in South Carolina. We go pull data for that. It’s nice. Actually, I really enjoy the day after almost more than I enjoy actual election nights.

Stephen: That’s funny. Well, actually, that kind of does make sense. What does ‘The Times do? Do they cater for you? Do they bring food in?

Jeremy: Oh, yeah, absolutely. As a matter of fact, one of The Wall Street Journal reporters, Byron Tau, makes a note of tweeting what everybody is having for dinner that night. He’ll have CNN, The Journal, the National Journal. The Times is pretty good. We normally have some combination of Szechuan or Thai. On Tuesday I was in New York for the New York primary. Of course, we had deli food. It was wonderful.

Stephen: Nice, nice, there you go. In terms of just wrapping things up then, for folks in the audience that might want to follow in your footsteps who are technologist and might want to sort of get into the news side or the politics side or the election side, what are the suggestions that you would have? How did you…I mean, obviously, we talked a little bit about how you got into it. What would your recommendations be for somebody who’s just breaking in?

Jeremy: I would say if you’d like to follow directly in my footsteps, you should be a failure at the first two things that you try, and then fall back to programming as your third shot at it.

Stephen: There you go.

Jeremy: I was a political science student at first and was going to teach, but my grades were terrible. I took the LSAT, because I thought I wanted to be a lawyer, and then did poorly on the LSAT. Then, in a fit of displeasure, I took a job working late night tech support at the St. Petersburg Times and just got started.

I’d say really the best thing is to listen for people’s problems. Almost all of the best software I’ve written has come from talking to a reporter and hearing a problem that a reporter has on their beat or somewhere in the news gathering process. We’ll have a researcher who will say,”Man, I just know that there are these people who die in Afghanistan and Iraq,” this happened at The Washington Post. There’s folks that die in Afghanistan and Iraq, and we get a fax every week with the names of every service member who dies.

But it’s really hard for us to answer any questions like, “How many people died in Afghanistan and Iraq this week?” Because we’re not sitting down and writing those together. You can do something as simple as set up a spreadsheet for someone or a little simple CRUD admin. It’s little problems like that that often turn into really cool stories, eventually.

I also say that your first project doesn’t have to be a big, award-winning, amazing data thing. There are lots of really easy low hanging fruit. I think Chicago Crime is a great example of that, because it wasn’t necessarily, on its face, supposed to be a journalistic enterprise. It was just a good civic data project. It was just as a citizen you need to know about this.

I feel like some of our best recruits have come from the civic data world, people who are just personally interested in the workings of the government or in our society and worked on a data project around that. Those people almost always have got the same sort of built-in incentives and ethics that we’re looking for here in the Fourth Estate.

Stephen: Yeah. In other words, what you’re saying, to some degree then, is that you’re not just looking for the technical skill set. You’re looking for somebody who is really able to work, whether it’s a reporter, whether it’s somebody sort of in a civic situation, but is able to listen and translate that into an actual requirement, as opposed to, “Hey, here’s some data. Let me show you something fancy I can do with it.

Jeremy: Yeah, absolutely. Truth be told, so many of the projects that we work on, you would think of them as boring technologically. It’s not as much fun. We’re not moving around millions of rows of data, although some of our projects, we get lucky and get to do things like that. A lot of it is just solving what I would consider to be fairly simple technical problems but that take a lot of empathy to figure out that they even exist.
Yeah, there’s a world of easy low-hanging civic fruit to get started on if you’re really interested in this sort of thing.

Anybody can be a journalist, man. You can be a journalist if you want to keep track of what’s happening to airplane tail numbers, and you want to see what that plane is that keeps flying over your house. This is like a great story. One of these civic data groups was watching tail numbers and figured out that there are lots of fixed wing aircraft flying over the city that were all rented out by the same company in Virginia. It was really weird. It turns out that company is owned by the FBI.

Stephen: There you go, yeah.

Jeremy: This is where good stories come from, right, is observation and tracking.

Stephen: There really is so much data. It’s just a matter of having, in many cases, the desire, I guess, or intent to put it to work, in other words, something stupid that I end up doing every year. There’s no tide charts, right?

Jeremy: Oh, yeah.

Stephen: We live on a river that opens up into the ocean about a mile and a half down. There’s no tide chart. It turns out that if you hit…what is it? I think it’s NOAA. NOAA has all that information, but it’s in some terrible format. [Ed note: credit for this process belongs to Jeff Inglis, who tipped me to it.] You pull it down. It’s not that hard to translate it into an ICal, and all of a sudden, it’s in your Google Calendar. Great, I know when the tide’s coming in. I know when the tide’s going out. Does that require any special technical skills? Not at all, but it is the kind of thing that you can take data in one format and make it useful to people. Yeah, I can definitely see that.

In terms of working on elections, all the work that you’ve done that have gone into all the elections that we talked about, what, for you, is the most rewarding aspect? What do you enjoy most about the work you do?

Jeremy: Oh, man, I think far and away it’s getting data to the smart people who work on our graphics desk or at The Upshot so that they can put together some killer analysis that nobody else has got. I loved looking at the big precinct maps that we had for New York yesterday or when The Upshot does their live model. Those are things that would be really hard to pull off unless you have got your election software down to not having to think about it at all. Because there’s so much plumbing and so much manning the bellows that has to be done in order to keep your election rig up and going.

The thing I feel like is that if you can apply just a little bit of technical rigor, maybe not too much, but just a little bit, or maybe some engineering principles to this, then it’ll give you the free time to work on the cool and awesome projects that you’d like to be doing in the first place. I’d have to say, that’s by far the most rewarding thing. It’s like getting all the low-end plumbing stuff out of the way so we can do really cool stuff in the time that we saved.

Stephen: Yeah, no, I’m sure. All right, last and most important question on Hark. What animal are you most frightened of?

Jeremy: Amoebas.

Stephen: Amoebas?

Jeremy: Yep, absolutely. They’re small. They can give you brain disease. You may never even know you have it. It’s really hard to kill them.

Stephen: That’s fair.

Jeremy: Yeah.

Stephen: All right, with that, I just want to say thanks, Jeremy. This has been fantastic. We’ll hope to get you back on soon.

Jeremy: Thanks, Stephen.

Stephen: All right, cheers.

Jeremy: Cheers.

Categories: Cloud, Interviews, Journalism, Podcasts.

Divide et Impera

[Caesar’s] purpose always was to keep his barbarian forces well scattered. During all of his campaigns in Gaul, he had a comparatively small army. His only means of success, therefore, against the vast hordes of the Gauls was to ‘divide and conquer.’
– Harry F Towle, “Caesar’s Gallic War

Throughout Roman history, and indeed the history of every large empire the world has ever known, divide and conquer has been one of the most reliable strategies for expansion and conquest. A tactic that exploits human nature’s preference for independence and autonomy, it is always more practical to engage with part of a whole rather than the whole itself. Conversely, populations that cannot unite will remain vulnerable to those which can. As Abraham Lincoln put it in a speech given in Springfield, Illinois on June 16, 1858, nearly two millennia after Caesar’s campaign in Gaul, “a house divided against itself cannot stand.”

The question of who is playing the part of the Roman empire today is an interesting one for the technology industry. Typically, when you speak with those shipping traditional on premise software today, their acknowledged competition is other on premise software companies. Much as the Gaul’s enemies, at least until Vercingetorix, were other Gauls.

At the OpenStack Summit last week, when posed the simple question of who they regarded as their primary competition, an executive representing a technical platform answered succinctly: “AWS. No question.”

That candid answer, while correct, remains surprisingly rare. It may not be for much longer, as the contrast in financial fortunes between Amazon’s cloud business attracts more and more notice, even amongst conservative buyers. For most of the past ten years, AWS has been hiding in plain sight, but its ability to sustain that below the radar success is being actively compromised by its success within the public market.

The story of Amazon’s ascent has been well chronicled at this point. As then Microsoft CTO Ray Ozzie acknowledged in 2008, Amazon had already by that point spent two years being the only ones taking the cloud seriously. Microsoft, to its credit, eventually followed fast but for them and every other would-be player in the market second place was the only realistic goal, at least in the short to medium term.

The less explored question is how those shipping on premise software might compete more effectively with the Amazon’s of the world. The answer of how not to compete, of course, is clear from both world history and recent financial history.

On a purely technical level, fragmentation is systemic at the moment, the new norm. Pick a technical category, and there are not just multiple technologies to select from, but increasingly multiple technical approaches. Each of these must first be deeply understood, even if they’re entirely new and thus difficult to conceptualize, before a technology choice can be made. And there are many such approaches to be studied and digested, at every level.

All of which is problematic from a vendor perspective. Making matters worse is that the commercial backers of these technologies are equally fragmented, which is to say they’re divided.

Consider the following chart on the microservices ecosystem from Sequoia:

Assuming one can properly understand the categories, and the differences in approaches between the technologies that populate them, and can evaluate the projects that match those approaches, what next? If you select NGINX, Kubernetes, Docker, Chef and Mongo, for example, what assurances do you have that these all work reliably together?

The answer, outside of rigid formal partnerships or broader packaging (e.g. CoreOS Tectonic), is very little. For all intents and purposes, each of the projects above and the commercial entities that back them are independent entities with different agendas and incentives. If you’ve ever tried to resolve a complicated failure involving multiple vendors, you know exactly what this means.

This is how the industry has done business for decades, of course. The majority of the incumbent technology suppliers, in fact, have revenue streams built on managing complexity on behalf of their clients. This complexity is also why strategies like Microsoft’s “integrated innovation” approach proved to be attractive. It’s not that each component of the stack was the indisputable technical leader in its field. It’s that they worked well enough together that long conversations about how to wire everything together were no longer necessary.

What if, in stark contrast to the industry’s history however, a competitive model emerged that abstracted traditional complexity away entirely? What if a growing number of difficult choices between specialized and esoteric software options gave way to a web page and a set of APIs for as-a-service implementations? All of a sudden, managing complexity – think large industry incumbents – becomes far less attractive as a business model. As does accelerating complexity by way of niche or specialized software – think point software products, which are now forced to compete with service-based implementations that are integrated out of the box.

With the advantages of cloud models clear, then, the obvious question is how alternative models compete. One obvious answer is to embrace, rather than fight, the tide. Many on premise software products will find themselves pivoting to become service based businesses.

But for those committed long term to an on premise model, new tactics are required. In a market that is struggling with fragmentation, solutions must become less fragmented. In some cases this will mean traditional formal partnerships, but these can be difficult to sustain as they require time and capital resource investments from companies that are certain to be short on one if not both. History suggests, however, that informal aggregations can be equally successful: the Linux, Apache, MySQL and PHP combination, as one example, achieved massive success – success that continues to this day.

The spectacular success of that particular combination is not likely to be neatly replicated today, of course, because the market has moved away from general purpose infrastructure to specialized, different-tools-for-different-jobs alternatives. There is no reason, however, that ad hoc stacks of complementary software offerings cannot be successful, even if some of the components bleed into one another functionally at the edges. If I was a runtime vendor, for example, I would be actively speaking with orchestration, container and database projects to try and find common ground and opportunity. Instead of pitching my runtime in a vacuum to customers considering public cloud options that offer a growing and far more complete suite of offerings, my offer would be a LAMP-like multi-function stack that customers could drop in and not have to assemble piece-by-piece, by hand. E pluribus unum.

This is not the market reality at present. Currently, vendors are typically heads down, focused on their particular corner of the world, built to viciously battle with other vendors in the same space who are also heads down. They’re divided, in other words, and preoccupied. This approach worked poorly for the Gauls. If history is any guide, it’s not likely to be any more effective for on premise software vendors today facing unified public cloud alternatives.

Join or die.

Disclosure: Amazon, Chef, CoreOS, Docker, MongoDB and NGINX are current RedMonk customers, while Microsoft is not at this time.

Categories: Cloud, Open Source.

OpenStack and the Fragmenting Infrastructure Market

Austin Convention Center

The questions around OpenStack today are different than they were last year, just as they were different the year before that, and the year before that. Go back far enough, and the most common was: has anyone ever successfully installed this? These days, installation and even upgrades are more or less solved problems, particularly if you go the distribution route. Even questions of definition – which of the many individual projects under the OpenStack umbrella are required to actually be considered OpenStack? – have subsided, even if they’re not yet addressed to everyone’s satisfaction.

The real questions around OpenStack today, in fact, have very little to do with OpenStack. Instead, the key considerations for those using the technology or more importantly considering it have to do with the wider market context.

The most significant challenge facing most enterprises today isn’t technology, it’s choice. For every category of infrastructure software that an enterprise can conceive of today, and many that they can’t, there are multiple, credible software offerings that will satisfy their functional requirements. At least one, and likely more than one, of the options will be open source. If you enjoy the creativity of software as an endeavor, this is a golden age.

The problem then is not a lack of technology options, as it might have been ten years ago if a business wanted to, as an example, store information in something other than a relational database. The problem is rather the opposite: there are increasingly too many options to cope with.

Which means that OpenStack, like every other piece of infrastructure technology, is facing more competition. Not in the apples to apples sense, of course. OpenStack outlasted its closest open source functional analogues in CloudStack and Eucalyptus, and the 7500 attendees at this week’s summit would argue that it’s the most visible open source cloud technology today.

But if projects that are exactly equivalent functionally to OpenStack are not emerging at present, technologies with overlapping ambitions are.

Consider containers. By themselves, of course, a container is from an OpenStack perspective little different from a virtual machine – an asset OpenStack was built to manage.

But if you’re making a Venn diagram of technical roles, the vast array of projects that are growing up around containers to orchestrate, schedule and otherwise manage them definitely overlaps with OpenStack today and will more in future. Like other projects that predate the explosive ascent of containers, OpenStack has been required to incorporate them after the fact via projects like Magnum, and according to numbers cited at the Summit 70% of users still want better support for what is, effectively, the new VM. Which OpenStack will ultimately provide, both directly and by serving as the substrate upon which users can run anything from Kubernetes to Mesos.

But what if projects like a Mesos herald a return to our distant past? From the earliest days of computing, the technology industry has been steadily been turning larger machines into larger numbers of smaller machines. The mainframe singular was broken up into mini-computers plural, mini-computers gave way to more machines in the client-server era, and client-server in turn to the scale of virtual machines and eventually clouds. Where we once deployed workloads to a single computer, then, the mainframe, today we deploy them to fleets of machines – so many, in fact, that we require scheduling software to decide which computers a given workload should run on.

What if compute took the next logical step, much as Hadoop has in the data processing space before it, and abstracted the network of machines entirely to present not the fleets of machines we have become accustomed to in this cloudy world, but one big computer – a virtual mainframe? This is, of course, the goal of Mesos and the recently open sourced DC/OS, and while it’s hardly a household name at present, it will be interesting to see whether customers remain fixated on the discrete, familiar asset model that OpenStack represents or whether they are attracted, over time, to the heavy abstraction of something like Mesos. If the web world is any indication, the latter is more probable.

The real problem for OpenStack, however, is that even if users come to believe that OpenStack and container-based infrastructure are not competitive but purely complementary, discovering that fact will take time. Which means that whether container-based infrastructure is or is not technically competition for OpenStack, from a market perspective it will function as such. The reverse is true as well, of course. Container-based infrastructure players continually face questions about whether they require foundational platforms such as an OpenStack (as at Time Warner), or whether users are better off running them on top of bare metal and cutting out the middle man.

All of which again is more of a commentary on the market today than anything OpenStack has or has not done. Like virtually every other infrastructure project today, the primary challenge for OpenStack at present is helping customers make sense of a sea of different technology options. The real danger for OpenStack and its would be competitors and partners, then, is that customers decide to make these choices someone else’s problem by advantaging public infrastructure. For OpenStack, then, and the vendors within its orbit, my message at the Summit was that the most important investments in the near term are not technology, but rather education and messaging.

Projects and the vendors that support them have a tendency to focus on their capabilities, leaving the wider context for users to work through on their own. In general, this is an unfortunate practice, but with the landscape as fragmented as it is today, it is a potentially fatal approach. If you don’t make things simple for users, they’ll find someone who will.

Categories: Cloud, Containers, Hardware-as-a-Service, Open Source.

Meet the New Monk: Rachel Stephens

Out of the thousand or so words that went into the job posting for our open analyst position, arguably the most important were eleven that were added by James. Under a heading entitled “The Role,” he wrote “We’re looking for someone that will make the role their own.” Some, maybe even most, of the candidates we interviewed asked about this, what it meant.

The short answer is that while the best candidates for us are those that prove to us that they can do this job, the ones that separate themselves are those that can do this job and do jobs that we’re not able to. We wanted our new analyst to bring something new to the table, not just skills we already have.

It would be easier, certainly, to simply filter our applicants to those that have prior analyst experience: we wouldn’t have to spend any time training new hires to be analysts. But as an analyst firm that sees the world through a very different lens than that employed by other firms in our industry we have to be more creative, which is why we deliberately cast as wide a net as possible. As has become typical when we hire, our inbox was full of resumes from all different educational and experiential backgrounds. DOD analysts. Major league baseball back office personnel. Nuclear technicians. Lawyers. Actuaries. Accountants. Professors. Developers.

And then there was this financial professional. Who’d been turned into a DBA. Who’d been trained in BI. Who was in the process of completing her MBA. Who knew how to use GitHub better than we did. Who had organized conferences. Who read through three years of the quarterly earnings transcripts from one of our clients as part of her application for the position, teaching us things we didn’t already know. Who was once offered a job on a flight by another of our clients. And most importantly, who demonstrated that she both understood and believed in what we do at RedMonk.

Hiring decisions are never easy, because serious applicants demand serious consideration, but the right candidates can make the decisions easier. When James and I were talking through the final set of candidates, we found ourselves getting excited talking about the different opportunities and potential projects for this one candidate in particular, and – remarkably diverse and capable set of candidates or no – our decision became clear.

Meet Rachel Stephens, the newest RedMonk analyst.

If you’ve been to the Monktoberfest, you have probably met Rachel already, as she has never missed one. She helped organize last year’s version, in fact. What you may or may not know about Rachel is that she has not just the skills to manipulate data but a true passion for using it to ask questions. Anyone who has opinions – strong opinions – on the differences in keyboard shortcuts between the Mac and Windows versions of Excel was likely to get our attention; the fact that Rachel also works comfortably in other tools from R to Tableau is gravy. As are contributions like her updates to Nathan Yau’s 2011 wunderground-scraping script from Visualize This (which I wish I’d seen before I blew an hour on that myself).

But apart from her hard core research skills and overpowering curiosity – attributes that we obviously prize highly – Rachel will also add an important new element to our work. With a long background in finance and with her MBA weeks away from being complete, one of the things Rachel will bring to RedMonk is a professional’s understanding of the balance sheet. This is an important skill in general, but it’s particularly relevant in emerging markets like cloud or Software-as-a-Service, where multi-billion dollar businesses are buried in filings under categories marked “Other.”

So whether she’s hard at work comparing, say, the relative trajectories of JavaScript frameworks based on Stack Overflow data or taking apart SEC filings for us, Rachel is a candidate that we know will make this role her own. We’re thrilled to have her on board.

While Rachel is based out of Denver, you can expect to see her around at the usual conferences in the Bay Area, Boston, Las Vegas, New York and so on. We’ve told her what a great group of people we at RedMonk know and work with, and she’s excited to meet all of you. Those of you who know Rachel and the quality of person she is know what you have to look forward to. For those of you who haven’t spoken with her yet, you’re going to enjoy working with her, I guarantee it.

Rachel’s first day with RedMonk will be May 23rd, but until then you should feel free to go say hello over on Twitter at @rstephensme.

Categories: People, RedMonk Miscellaneous.

Someone Else’s Problem

The above statement is exactly correct. The idea that serverless literally means no servers is no more accurate than the argument that does not sell software. There are servers behind serverless offerings such as AWS Lambda just as there is software behind If you want to get pedantic, you might argue that both of these statements are lies. While the pedant may be technically correct, however, a larger and more important truth is obscured.

Consider the challenge facing those who wish to compare private cloud to public cloud, as but one example. Even if you can counterfactually assume relative feature parity between private and public offerings, it remains a comparison of two entirely distinct product sets. The fact that they happen to attack the same functional area – base cloud infrastructure – should not be taken to mean that they can or should be directly compared. Private cloud solutions are about using combinations of software and hardware to replicate the most attractive features of public cloud infrastructure: dynamic provisioning, elasticity, and so on. Public cloud offerings, like any other as-a-service business, are as much about assuming the burden of someone else’s problem as they are the underlying hardware or software.

Which is why it’s interesting that IaaS, PaaS and SaaS providers don’t emphasize this distinction for the most part.

To some extent, this is logical, because it’s an inherently IT-unfriendly message – if a buyers’s problems are made a seller’s problems, it follows that some people from the buyer side are no longer necessary – and making unnecessary enemies is rarely a profitable strategy. It’s quite evident, however, that, while perhaps not unanimously, buyers are putting an increasing emphasis on making their problem someone else’s problem.

As they should. Because many of the problems that have traditionally been the province of the enterprise, shouldn’t be. In a world in which solving technology problems means upfront material capital expenditures on hardware and software, having high quality resources that can address those problems efficiently is important, if not differentiating. In the post-cloud context, however, this is far less important, because you can select between multiple providers in any given category that are likely to be able to provide a given service with greater uptime and a higher level of security than you can. There’s a reason, for example, that the two fastest growing services in the history of AWS are Redshift and Aurora: database and warehousing infrastructure is as expensive to maintain as it is tedious. Or put differently, what is the business value of having in-house the skills necessary to keep complex and scalable database infrastructure up and running? Is the value greater or less than the premium you’ll pay to third parties such as an Amazon, Google, Heroku, IBM or Microsoft to maintain it for you?

At which point the question becomes not whether this is literally server or software-less, but rather whether or not you want servers and software to be your problem or someone else’s. Increasingly, the market is favoring the latter, which is one reason the commercial value of on premise software is in decline while service based alternatives are seeing rapid growth. It is also why *aaS providers should be explicitly and heavily emphasizing their greatest value to customers: the ability to take on someone else’s problems.

Disclosure: AWS, IBM, Salesforce (Heroku) are RedMonk customers, Google and Microsoft are not currently customers.

Categories: Cloud, Hardware-as-a-Service, Platform-as-a-Service, Services, Software-as-a-Service.

Yes, It is Harder to Monetize Open Source. So?

Crash Collapse

Four years ago this month, Red Hat became the first pure play commercial open source vendor to cross the billion dollar revenue mark – beating my back of the envelope forecast in the process. This was rightfully greeted with much fanfare at the time, given that if you go back far enough, a great many people in the industry thought that open source could never be commercialized. Enthusiasm amongst open source advocates was necessarily tempered, however, by the realization that Red Hat was, in the financial sense, an outlier. There were no more Red Hat’s looming, no other pure play commercial open source vendors poised to follow the open source pioneer across the billion dollar finish line.

Four years later, there still aren’t. Looking around the industry, Red Hat remains the sole example of a pure play open source organization matching the revenue generated by even modest-sized proprietary alternatives, and as was the case four years ago, there are no obvious candidates to replicate Red Hat’s particular success.

Which has, understandably, led to assertions that – in the non-literal sense – open source can’t make money and is difficult to build a business around. Assertions for which there are exceptions like Red Hat, but that are generally defensible based on the available facts.

What these discussions typically omit, however, is that – as we’re reminded by by Adrian Cockcroft – it’s also getting harder to make money from proprietary software. As has been covered in this space for years (for example), and in book form in The Software Paradox, sales of software generally have been on a downward trajectory over the past decade or more. Notably, this is true across software categories, consumer to enterprise. From large software providers such as IBM, Microsoft or Oracle seeing systemic declines in software margins, revenue or both to consumer companies like Apple taking the price of their operating system from just under $200 to zero, the simple fact is that it’s getting more difficult to monetize software as a standalone asset.

It’s far from impossible, obviously: Microsoft’s revenue stream from software as but one example is measured in billions in units of ten. But when you look across industries, at company after company, the overall trendline is clear: it’s harder to make money from software than it used to be – regardless of whether the model employed is volume or margin, open or closed. Smart companies realize this, and are already hedging themselves against these declines with alternative revenue models. There is a reason why we’re having a lot more Software Paradox-related conversations with our clients today than we would have even a few years ago: the writing is on the wall.

So yes, we are no more likely to see another Red Hat today than we were four years ago. But that says a lot less about the merits of open source as a model than it does about commercial valuations of software in general.

Categories: Business Models, Open Source.

Ubuntu and ZFS: Possibly Illegal, Definitely Exciting

The project originally known as the Zettabyte File System was born the same year that Windows XP began shipping. Conceived and originally written by Bill Moore, Jeff Bonwick and Matthew Ahrens among others, it was a true next generation project – designed for needs that could not be imagined at the time. It was a filesystem built for the future.

Fifteen years later, it’s the future. Though it’s a teenager now, ZFS’s features remain attractive enough that Canonical – the company behind the Ubuntu distribution – wants to ship ZFS as a default. Which wouldn’t seem terribly controversial as it’s an open source project, except for the issue of its licensing.

Questions about open source licensing, once common, have thankfully subsided in recent years as projects have tended to coalesce around standard, understood models – project (e.g. GPL), file (e.g. MPL) or permissive (e.g. Apache). The steady rise in share of the latter category has further throttled licensing controversy, as permissive licenses impose few if any restrictions on the consumption of open source, so potential complications are minimized.

ZFS, and the original OpenSolaris codebase it was included with, were not permissively licensed, however. When Sun made its Solaris codebase available for the first time in 2005, it was offered under the CDDL (Common Development and Distribution License), an MPL (Mozilla Public License) derivative previously written by Sun and later approved by the OSI. Why this license was selected for Solaris remains a matter of some debate, but one of the plausible explanations centered around questions of compatibility with the GPL – or lackthereof.

At the time of its release, and indeed still to this day as examples like ZFS suggest, Solaris was technically differentiated from the far more popular Linux, offering features that were unavailable on operating system alternatives. For this reason, the theory went, Sun chose the CDDL at least in part to avoid its operating system being strip-mined, with its best features poached and ported to Linux specifically.

Whether this was actually the intent or whether the license was selected entirely on its merits, the perceived incompatibility between the licenses (verbal permission from Sun’s CEO notwithstanding) – along with healthy doses of antagonism and NIH between the communities – kept Solaris’ most distinctive features out of Linux codebases. There were experimental ports in the early days, and the quality of these has progressed over the years and been made available as on-demand packages, but no major Linux distributions have ever shipped CDDL-licensed features by default.

That may change soon, however. In February, Canonical announced its intent to include ZFS in its next Long Term Support version, 16.04. This prompted a wide range of reactions.

Many Linux users, who have eyed ZFS’ distinctive featureset with envy, were excited by the prospect of having official, theoretically legitimate access to the technology in a mainstream distribution. Even some of the original Solaris authors were enthusiastic about the move. Observers with an interest in licensing issues, however, were left with questions, principally: aren’t these two licenses incompatible? That had, after all, been the prevailing assumption for over a decade.

The answer is, perhaps unsurprisingly, not clear. Canonical, for its part, was unequivocal, saying:

We at Canonical have conducted a legal review, including discussion with the industry’s leading software freedom legal counsel, of the licenses that apply to the Linux kernel and to ZFS.

And in doing so, we have concluded that we are acting within the rights granted and in compliance with their terms of both of those licenses. Others have independently achieved the same conclusion.

The Software Freedom Conservancy, for its part, was equally straightforward:

We are sympathetic to Canonical’s frustration in this desire to easily support more features for their users. However, as set out below, we have concluded that their distribution of zfs.ko violates the GPL.

If those contradictory opinions weren’t confusing enough, the Software Freedom Law Center’s position is dependent on a specific interpretation of the intent of the GPL:

Canonical, in its Ubuntu distribution, has chosen to provide kernel and module binaries bundling ZFS with the kernel, while providing the source tree in full, with the relevant ZFS filesystem code in files licensed as required by CDDL.

If there exists a consensus among the licensing copyright holders to prefer the literal meaning to the equity of the license, the copyright holders can, at their discretion, object to the distribution of such combinations

The one thing that seems certain here, then, is that very little is certain about Canonical’s decision to ship ZFS by default.

The evidence suggests that Canonical either believes its legal position is defensible, that none of the actors would be interested or willing to pursue litigation on the matter, or both. As stated elsewhere, this is if nothing else a testament to the quality of the original ZFS engineering. The fact that on evidence, Canonical perceives the benefits to outweigh the potential overhead of this fifteen year old technology is remarkable.

But if there are questions for Canonical, there are for their users as well. Not about the technology, for the most part: it has withstood impressive amounts of technical scrutiny, and remains in demand. But as much as it would be nice for questions of its licensing to give way before its attractive features, it will be surprising if conservative enterprises consider Ubuntu ZFS a viable option.

If ZFS were a technology less fundamental than a filesystem, reactions might be less binary. As valuable as DTrace is, for example, it is optional for a system in a way that a filesystem is not. With technology like filesystems or databases, however, enterprises will build the risk of having to migrate into their estimates of support costs, making it problematic economically. Even if we assume the legal risks to end users of the ZFS version distributed with Ubuntu to be negligible, concerns about support will persist.

According to the SFLC, for example, the remedy for an objection from “licensing copyright holders” would be for distributors to “cease distributing such combinations.” End users could certainly roll their own versions of the distribution including ZFS, and Canonical would not be under legal restriction from supporting the software, but it’s difficult to imagine conservative buyers being willing to invest long term in a platform that their support vendor may not legally distribute. Oracle could, as has been pointed out, remove the uncertainty surrounding ZFS by relicensing the asset, but the chances of this occurring are near zero.

The uncertainty around the legality of shipping ZFS notwithstanding, this announcement is likely to be a net win for both Canonical and Ubuntu. If we assume that the SFLC’s analysis is correct, the company’s economic downside is relatively limited as long as it complies promptly to objections from copyright holders. Even in such a scenario, meanwhile, developers are reminded at least that ZFS is an available option for the distribution, regardless of whether the distribution’s sponsor is able to provide it directly. It’s also worth noting that the majority of Ubuntu in usage today is commercially unsupported, and therefore unlikely to be particularly concerned with questions of commercial support. If you browse various developer threads on the ZFS announcement, in fact, you’ll find notable developers from high profile web properties who are already using Ubuntu and ZFS in production.

Providing developers with interesting and innovative tools – which most certainly describes ZFS – is in general an approach we recommend. While this announcement is not without its share of controversy, then, and may not be significant ultimately in the commercial sense, it’s exciting news for a lot of developers. As one developer put it in a Slack message to me, “i’d really like native zfs.”

One way or another, they’ll be getting it soon.

Categories: Open Source, Operating Systems.

What’s in Store for 2016: A Few Predictions

Volviendo / Coming Back...

Every so often, it’s worth taking a step back to survey the wider technical landscape. As analysts, we spend the majority of our time a few levels up from practitioners in an attempt to gain a certain level of perspective, but it’s still worth zooming out even further. To look not just at the current technical landscape, but to extrapolate from it to imagine what the present means for the future.

For six years running, then, I’ve conducted this exercise at the start of the new year. Or at least plausibly close to it. From the initial run in 2010, here is how my predictions have scored annually:

  • 2010: 67%
  • 2011: 82%
  • 2012: 70%
  • 2013: 55%
  • 2014: 50%
  • 2015: 50%

You may note the steep downward trajectory in the success rate. While rightly considered a reflection of my abilities as a forecaster, it is worth noting that the aggressiveness of the predictions was increased in the year 2013. This has led to possibly more interesting but provably less reliable predictions since; you may factor the adjustment in as you will.

Before we continue, a brief introduction to how these predictions are formed, and the weight you may assign to them. The forecast here is based, variously, on both quantitative and qualitative assessments of products, projects and markets, based on everything from hard data to off hand conversations. For the sake of handicapping, the predictions are delivered in groups by probability; beginning with the most likely, concluding with the most volatile.

With that explanation out of the way, the predictions for the year ahead:


  • Bots are the New UI:
    There are two dangers to delaying the release of your annual predictions until well into the new year. First, they can be proven correct before you publish, meaning that your prediction is no longer, technically speaking, a prediction. Second, someone else can make a similar prediction, which can – depending on the novelty of what you forecast – steal your thunder.

    Both of these have unfortunately transpired over the past month. First, Google’s Cloud Functions and IBM’s OpenWhisk obviated the need for my doubling-down on a bullish forecast for serverless architectures. And just a few weeks earlier, Tomasz Tunguz – who is always worth reading, incidentally – unknowingly stole major elements from my prediction regarding bots in a piece entitled The New UI For SaaS – The Question.

    One of the most surprising conversations I have today is with enterprise vendors who dismiss Slack as a messaging vendor, or with engineers who view it as little more than an IRC-implementation for muggles. Both miss the point, in my view. First, because they miss the platform implications, which I’ll get to, but just as importantly because they obscure the reality that bots are the new UI.

    Consider the universal problem of a user interface. If you’re implementing a GUI, you face increasingly difficult decisions about how to shoehorn a continually expanding featureset into the limited real estate of a front end. Making matters worse, aesthetic expectations have been fundamentally reset by the incursion of consumer-oriented applications. And while you’re trying to deliver a clean elegant user interface with too many features, the reality of mobile is that you’ll probably need to do so with even more limited screen real estate.

    Those whose users primary or sole interface is the command line have it easier to some degree, but their lives are also complicated by rampant fragmentation. Gone are the days when you could expect developers to memorize every option or flag on every command because there are simply too many commands. Too many developers today are reduced to Google or Stack Overflow as an interface because they’re not using a given tool quite enough to have completely internalized its command structure and options.

    Attempts to solve these user interface problems to date have essentially been delaying actions, because the physics of the problem are difficult to address. Complexity can only be simplified in so many ways so many times before it’s complex again.

    Enter the bot, which is essentially a CLI with some artificial intelligence baked in. Deployed at present at relatively narrow, discrete functional areas, their ultimate promise – as Tunguz discusses – is much broader. But for now text-based AI’s such as’s Amy or the Slack-based Howdy or Meekan point the way towards an entirely new brand of user interface. One in which there is no user interface, at least as we are typically acquainted with that term. If I want to schedule a meeting with someone via Amy, I don’t log in to a new UI and look at schedules, I use the same user interface I always have: email. Amy the artificial assistant parses the language, has contextual awareness of my calendar and then coordinates with the third party much as a human would. Or if I’m booking with one of us internally, I no longer have to open Google Calendar: I ask Meekan to pick a time and a date and turn it loose.

    And bots are not just for scheduling meetings – or ordering cars from Uber. Within the coming year we’re going to see tools extensively ported to bots. Why can’t I start and stop machines via a bot as I would the CLI? Or ask questions about my operations performance? Or, elsewhere, my run rate or cashflow? Some of our clients are working on things like this as we speak, and Slack’s December Platform launch, including the botkit Howdy, will speed this along.

    We’ve all had the experience at one point or another – particularly if you’ve ever used Google Analytics – of paging endlessly through a user interface for something we know an application can do, but can’t figure out how. What if you could skip that, and simply ask a bot in plain English (or the language of your choice) to do what you want?

    Folks who have been using things like Hubot for years already know the answer to this. As platforms like Slack expand, more of us will begin to realize the advantages to this in 2016, as bots become the New UI.

  • Slack is (One of) The New Platform(s):
    Based in large part on the absurd success, both in terms of marketshare and revenue, of Microsoft’s twin platforms, Office and Windows, software businesses ever since have attempted to become platforms. Most of these efforts historically have ended in failure. Becoming a platform, as it turns out, is both expensive and entirely dependent on something that is intensely difficult to predict: volume traction. Even for well capitalized would be players with platform ambitions, the dynamics that lead to the annointment as a platform are difficult to navigate.

    Few, particularly those who still regard Slack as a jumped up instant messaging client, would have anticipated that Slack would become such a platform, but it’s well on its way. We have had persistent group chat clients and capabilities for decades, at this point, and for all of their immense user traction, even the most popular IM networks never made the jump to platform. Domestically, at least: China’s networks are materially distinct here.

    Most obviously, Slack’s growing its userbase: it essentially quadrupled over the past calendar year from around 500,000 users to over 2 million. But the important jump was in its app catalog. From 150 apps in the catalog at launch, Slack has almost doubled that number to 280 at the moment. And we’re seeing significant interest and traction from third parties who’d like to add themselves to that number, because Slack is checking an increasing number of the boxes first class platforms have to to be taken seriously.

    When we look back on 2016, then, it will be regarded as the year that Slack became a platform.

  • Newsletters are the New Blogs:

    Whether you attribute the decline in RSS and its client applications to the rise of social media like Facebook and Twitter is, to some degree, academic. Whether they were the cause or simply the beneficiary, the fact is that a great many whose consumption of content used to depend on RSS readers now look to the social networks to fill a similar need.

    Similar is not same, however. As Facebook’s algorithmic feed and Twitter’s much excoriated dalliances with something similar have demonstrated, one of the difficulties with social networks is that they’re difficult to scale. With an RSS reader, you don’t miss a post from an author you’ve subscribed to. With Facebook or Twitter, the more you friends you have, the more difficult it is not to.

    Enter newsletters. Well, technically that’s not accurate, as they’ve been around since well before RSS readers or social networks. But since the demise of the former and the rise of the latter, newsletters are increasingly becoming the de facto alternative, as Paul Ford suggests above. If you want to be sure readers don’t miss your content, and readers are similarly interested, newsletters have been pressed into service as the solution.

    In 2016, we’ll see this trend go mainstream, and authoring tools designed for actual authors rather than, say, marketers, will emerge.

    All of which means I probably need to start a newsletter already.


  • Open Source is the Future, and It Will Become Evenly Distributed:
    The rise of open source at this point has been well chronicled. While the most efficient mechanisms for commercializing open source software remain hotly debated, the sustainability of open source itself is no longer in question. In an increasing number of scenarios, open source is viewed even by staunchly capitalistic businesses as a logical strategic choice.

    Even so, we haven’t yet hit the tipping point where it’s the default software development model. There are still many more scenarios in which open source is an exception, a mere science experiment, rather than the most logical choice for a given piece of software.

    There were signs in 2015 that this was changing, and this will accelerate in 2016. Google, for example, has typically guarded its infrastructure software closely. It published the details that made building Hadoop possible, but kept its actual implementation closed. With Microsoft’s CNTK or Google’s TensorFlow and arguably Kubernetes (it’s not Borg, but a reimplementation of it), this pattern has begun to shift. Apple’s decision to make its Swift runtime open source is another example of an organization which has historically been protective of its software assets recognizing that the benefits to open source outweigh the costs of proprietary protections. Even in industry, enterprises are beginning to see the advantages – whether in developer marketing/recruitment/retention, cost amortization, etc – and make strides towards either releasing their own internal software as open source (see Capital One’s Hygieia) or easing restrictions on contributing back to existing projects.

    Open source will become evenly distributed, then, in 2016.

  • SaaS is the New Proprietary…But Will Lead to More Open Source:
    As I have argued previously, SaaS is on several levels a clear and present danger to open source sofware. First, questions about access to source are deemphasized in off premise implementations in ways they are not in on prem alternatives. Second, many SaaS offerings have incorporated the embrace, extend and extinguish model by building attractive proprietary extensions onto open source foundations. Lastly, just as open source enjoyed massive advantages in convenience and accessibility over proprietary alternatives, so too is SaaS more convenient than OSS.

    Where many OSS advocates still consider traditional proprietary software the threat, then, they would do better to shift their attention to SaaS alternatives.

    All of that being said, SaaS is counterintuitively a potential benefactor to open source in important ways. As described above, important SaaS vendors are both investing heavily in software development to tackle very difficult, unique problems and realizing that the benefits to making some or all of this software available as open source outweigh the costs.

    The Platform-as-a-Service market is perhaps the industry’s best evidence of this. The initial implementations in early 2007 – and Google App Engine – massively lagged IaaS alternatives in adoption not because of technical limitations, but because of their proprietary nature. The technical promise of PaaS – focus on the application, not the infrastructure it runs on – was intriguing from a developer standpoint. But no one wanted to write applications that would never run anywhere else.

    Fast forward six years and the PaaS market is a promising, growing category. Why? Because customer concerns about lock-in have been mitigated via the use of open source software. As ever, developers and the enterprises they work for are more likely to walk through a door they know they can walk back out of.

    AWS’ Lambda is a more recent indication of this phenomenon at work. Technically innovative, it underperformed from a visibility and adoption perspective largely because of concerns around lock-in. These may or may not be lessened by the release of similar server-less services from Google and IBM, but if history is any guide, the simplest path towards dramatically accelerating Lambda adoption would be for AWS to release an open source implementation of the product.

    Whether the famously private Amazon will take such a step is unknown, but on an industry-wide basis the growth of SaaS will lead to the release of more open source software in 2016.

  • Winter is Coming:
    We may be less than a week from the end of meterological winter, but the metaphorical kind is still looming. The obvious signs of a market correction are there: an increasingly challenging funding environment, systemic writedowns of existing investments, a renewed skepticism of the sustainability and funding models of startups, and existential crises for multiple large incumbents. The less obvious signs are the private conversations, subtle pattern shifts in job hunting trends and so on.

    How deep or prolonged the next dip will be is difficult to predict at this time, but what seems inevitable is that it will start this year.


  • Google Releases an iMessage Competitor at I/O:
    Google’s strategy with respect to messaging has been perplexing of late. While products like HipChat are correctly regarded as the primary competition for Slack, it is nevertheless true that a good portion of the latter’s traction has come at the expense of Google Talk – a product which has seriously languished in recent years. Towards the SMS end of the messsaging spectrum, meanwhile, Google’s general response to the rapid growth and popularity of Apple’s iMessage has been apathy and indifference. Which makes sense if the only business you care about is search. If, however, the enterprise collaboration and mobile markets are of some importance – as Google’s actions on paper suggest they are – this inaction is baffling.

    More to the point, for every quarter they delay a response, they’re that much further behind from an adoption standpoint. Even if they were able to roll out a viable iMessage competitor for Android tomorrow, for example, they’d be facing a protracted battle to win users back from competiting services.

    Perhaps Google has come to regard the messaging market as akin to the old IM networks; superficially useful, but limited in their long term value. Or maybe they’re pessimistic about the opportunity to compete with multiple closed, defensible networks and are planning the strategic equivalent of an island hop. The difficulty with either strategy is that if the first prediction above is true, and bots are the new UI, Google’s lack of a visible, well adopted chat vector to their users is a serious problem.

    Which is why I expect Google to attempt to remedy this in 2016, the logical release for which would be at the I/O conference. Google is undoubtedly behind, but not insurmountably so. Yet. Slack is still in low single digit millions from an adoption standpoint, and Apple has artificially created vulnerabilities with its single platform approach – an iMessage that worked seamlessly across platforms and, importantly, had legitimate (i.e. not Mac’s Messages) desktop clients for a variety of desktop operating systems would generate interest, at least.

  • 2016 Isn’t the Year of VR, the Rift/Vine/etc Notwithstanding:
    A little while back I had the opportunity to demo the latest build of Oculus’ VR software and hardware. It was legitimately mindblowing. I haven’t had too many experiences like it in my time in this industry. The last portion of the demo placed you on a city street in the midst of an alien attack. Action was slowed dramatically, so you could turn your head and watch a bullet float by, or watch the car next to you detonate and lift into the air as if it were underwater, but still on fire. Insane.

    But 2016 isn’t going to be the year of VR.

    Most importantly, the equipment is too expensive. As Wired says, the problem isn’t necessarily with the cost of the unit itself, in spite of the $600 price tag (or $800, if you want an HTC Vive): it’s the total cost of ownership, to borrow the enterprise term. First, the $600 doesn’t include higher end controllers. But more importantly, it doesn’t factor in the cost of the associated PC hardware – specifically the graphics card:

    True, you can bring that cost down by going with a desktop, but how many people will buy a desktop over a laptop these days? Even if cost is addressed, it will take time to populate the kind of software catalogs buyers will need to see to justify the expense and the equipment.

    Based on the few times I’ve used VR, I’m bullish on the technology long term. But my expectations for it in 2016 are modest.


  • “Boot” projects Will Become a Thing:
    Better than ten years removed from the initial release of Rails, it seems strange to be writing about the “new” emphasis on projects intended to simplify the boostrapping process. But in spite of the more recent successes of projects like Bootstrap and Spring Boot, such projects are not the first priority for most technical communities. Perhaps because of the tendency to admire elegant solutions to problems of great difficulty, frameworks and on ramps to new community entrants tend to be an afterthought. In spite of this, they frequently prove to be immensely popular, because in any given community the people who know little about it far outnumber the experts. Even in situations, then, when the boot-oriented project and its opinions are outgrown, boot-style projects can have immense value simply because they widen the funnel of incoming developers. Which is, as we tell our customers are RedMonk every day, one of the single most important actions a project can take.

    Based in part of the recent successes mentioned as well as a growing awareness of this type of project’s value, we’re going to see boot-style projects become a focus in the year ahead, because every project should have one.

  • Open Source Hardware Becomes a Measureable Player:
    We’ve known for some time that the largest internet providers have been heavily vertically integrated, more so by the year. From Google’s custom servers to Facebook’s custom networking gear to Amazon’s custom racks and custom chips built with Intel, the web pioneers have little reliance today on external integrated products. For all that traditional incumbents have attempted to portray themselves as arms suppliers to the world’s biggest and fastest web properties, the reality is that they at best have been relegated to niche suppliers and at worst have been cut out of the supply chain entirely. Initiatives like Facebook’s Open Compute project have only helped accelerate this trend, by democratizing access to hard-won insights in high-scale compute, network, storage problems.

    Vendors have sprung up around these and other efforts – Cumulus Networks, for example – and this will inevitably continue, as the same forces that sought to excise the margin on first software and then compute continue towards networking and storage. Call it the fulfillment of the disruption that began as far back as 2014, but in the year ahead we’ll see hard impacts from open source hardware on large existing incumbents.


  • AI Will Be Turned Loose on Crime:
    For anyone who’s listened to the first season of Serial, one of the things that hits you is just how much data there is to process. From verbal statements to timelines to maps to cell tower records to email threads, it’s an immense amount of information to keep track of, even for a single victim crime. With each offense, the complexity goes up commensurately.

    Complexity and synthesis of multiple forms of disparate information – particularly tedious, numerical information – is not something that people in general do particularly well. Computers, on the other hand, are exceptional at it. With the accompanying improvements in natural language processing, additionally, it’s possible to envision Philip K Dick-like AI-detectives that can process thousands of streams of information quickly and dispassionately, rendering judgements on outcomes.

    We’re a little ways off from Blade Runner, of course – Moravec’s Paradox still holds, even if yesterday’s Atlas videos are terrifying. But purely from an analysis perspective, we’re clearly at the point where an AI could assist in at least some investigatory elements.

    What would the interest be from the AI side? Clearly not financial, because even if the system worked perfectly it would likely take a decade or more to address law enforcement and legal concerns. No, the primary benefit would be marketing value. IBM didn’t have Watson play Jeopardy for the prize money; the benefit was instead marketing, introducing the first computer to play and beat humans at a spoken language game.

    With that in mind, it’s difficult to imagine a higher profile potential marketing opportunity than true crime. Consider the transcendent success of Serial and the more recent popularity of Netflix’s Making a Murderer. What if an AI project could be a primary factor behind the discovery of a miscarriage of justice?

    It would be very interesting indeed, which is why we might see it in 2016.


  • Silicon Valley Continues to Follow in Wall Street’s Footsteps:
    My Dad worked on Wall Street for forty years, the entirety of his career. When I was growing up, this fact could be cited with something like pride. If nothing else, Wall Street was a fiercely competitive market that attracted intelligent participants. Whatever else might be said about this flag bearer for capitalism, it meant you knew how to work hard and compete.

    Today, Wall Street is a ruined term, having become synonymous with a spectacular tonedeafness, outrageous excesses of compensation and uncontrolled greed. I’m still proud of my Dad, but in spite of his time on Wall Street rather than because of it. He was, fortunately for us, the antithesis of Wall Street rather than the embodiment of it. He was never corrupted by that business, and that fact did him no favors over the course of his career.

    When I got into technology a few decades ago, I had a lot of pride in my industry, much as I’m sure my Dad did. He probably felt about Wall Street the way I did about Silicon Valley. At least initially.

    Looking around the technology industry today I am regularly dismayed by what I see. From calls for the secession of California to arguments in favor of increasing inequality to literally unbelievable insensitivity to those less fortunate, the term Silicon Valley is – in the circles I travel in, at least – becoming synonymous with…a spectacular tonedeafness, outrageous excesses of compensation and uncontrolled greed. For the first time in my career, I am occasionally embarrassed to tell someone I work in technology.

    The overwhelming majority of the people in this industry, of course, are regular, good people. It is undoubtedly a case of a few bad apples ruining the bunch. But unfortunately much the same is true of Wall Street: most of the people who work there are not members of the 1%, just people trying to get by. That distinction, however, gets lost quickly.

    We in the technology industry are running the same risk, in my opinion. Unless the excesses are widely condemned, and unless we can collectively articulate a vision that isn’t something like “we always know best” or “the homeless should just learn a computer language”, I fear Silicon Valley is headed the way of Wall Street. That most of us aren’t responsible for the appalling lack of empathy won’t matter: we’ll all be tarred with the same brush.

    I don’t expect any progress in this department in 2016, which is why it’s listed here. Alas.

Categories: AI, Business Models, Cloud, Collaboration, Hardware-as-a-Service, Open Source, Platform-as-a-Service, Platforms, Social, Software-as-a-Service, VR.

The RedMonk Programming Language Rankings: January 2016

This iteration of the RedMonk Programming Language Rankings is brought to you by Rogue Wave Software. It’s hard to be a know-it-all. With our purpose-built tools, we know a lot about polyglot. Let us show you how to be a language genius, click here.

It’s been a very busy start to the year at RedMonk, so we’re a few weeks behind in the release of our bi-annual programming language rankings. The data was dutifully collected at the start of the year, but we’re only now getting around to the the analysis portion. We have changed the actual process very little since Drew Conway and John Myles White’s original work late in 2010. The basic concept is simple: we periodically compare the performance of programming languages relative to one another on GitHub and Stack Overflow. The idea is not to offer a statistically valid representation of current usage, but rather to correlate language discussion (Stack Overflow) and usage (GitHub) in an effort to extract insights into potential future adoption trends.

With the exception of GitHub’s decision to no longer provide language rankings on its Explore page – they are now calculated from the GitHub archive – the rankings are performed in the same manner, meaning that we can compare rankings from run to run, and year to year, with confidence.

Historically, the correlation between how a language ranks on GitHub versus its ranking on Stack Overflow has been strong, but this had been weakening in recent years. From its highs of .78, the correlation was down to .73 during our last run – the lowest recorded. For this run, however, the correlation between the properties is once again robust. For this quarter’s ranking, the correlation between the properties was .77, just shy of its all time mark. Given the recent variation, however, it will be interesting to observe whether or not this number continues to bounce.

Before we continue, please keep in mind the usual caveats.

  • To be included in this analysis, a language must be observable within both GitHub and Stack Overflow.
  • No claims are made here that these rankings are representative of general usage more broadly. They are nothing more or less than an examination of the correlation between two populations we believe to be predictive of future use, hence their value.
  • There are many potential communities that could be surveyed for this analysis. GitHub and Stack Overflow are used here first because of their size and second because of their public exposure of the data necessary for the analysis. We encourage, however, interested parties to perform their own analyses using other sources.
  • All numerical rankings should be taken with a grain of salt. We rank by numbers here strictly for the sake of interest. In general, the numerical ranking is substantially less relevant than the language’s tier or grouping. In many cases, one spot on the list is not distinguishable from the next. The separation between language tiers on the plot, however, is generally representative of substantial differences in relative popularity.
  • GitHub language rankings are based on raw lines of code, which means that repositories written in a given language that include a greater amount of code in a second language (e.g. JavaScript) will be read as the latter rather than the former.
  • In addition, the further down the rankings one goes, the less data available to rank languages by. Beyond the top tiers of languages, depending on the snapshot, the amount of data to assess is minute, and the actual placement of languages becomes less reliable the further down the list one proceeds.

(click to embiggen the chart)

Besides the above plot, which can be difficult to parse even at full size, we offer the following numerical rankings. As will be observed, this run produced several ties which are reflected below (they are listed out here alphabetically rather than consolidated as ties because the latter approach led to misunderstandings). Note that this is actually a list of the Top 21 languages, not Top 20, because of said ties.

1 JavaScript
2 Java
4 Python
5 C#
5 C++
5 Ruby
9 C
10 Objective-C
11 Shell
12 Perl
13 R
14 Scala
15 Go
15 Haskell
17 Swift
18 Matlab
19 Clojure
19 Groovy
19 Visual Basic

JavaScript’s continued strength is impressive, as is Java’s steady, robust performance. The long time presence of these two languages in particular atop our rankings is no coincidence; instead it reflects an increasing willingness to employ a best-tool-for-the-job approach, even within the most conservative of enterprises. In many cases, Java and JavaScript are leveraged side-by-side in the same application, depending on its particular needs.

Just as JavaScript and Java’s positions have remained unchanged, the rest of the Top 10 has remained similarly static. This has become the expectation rather than a surprise. As with businesses, the larger a language becomes, the more difficult it is to outperform from a growth perspective. This suggests that what changes we’ll see in the Top 10 will be slow and longer term, that fragmentation has begun to slow. The two most obvious candidates for a Top 10 ranking at this point appear to be Go and Swift, but they have their work cut out for them before they get there.

Outside of the Top 10, however, here are some of the more notable performers.

  • Elixir: The Erlang-friendly language made a notable jump this time around. The last quarter we surveyed languages, Elixir placed at #60. As of this January, it had jumped to #54. While we caution against reading too much into specific numerical differences, the more so the further down the list one goes, this change is notable as it suggests that the language – a darling amongst some language aficionados – is finally seeing some of the growth ourselves and others have expected from it. Interestingly, Erlang did not benefit from this bounce, as it slid back to #26 after moving up to #25 last quarter.

  • Julia: Julia’s growth has been the tortoise to other languages’ hares historically, and this run was no exception. For this run, Julia moves from #52 to #51. Given the language’s slow ascent, it’s worth questioning the dynamics behind its adoption, and more specifically whether any developments might be anticipated that would materially change its current trajectory. So far, the answer to that question has been no, but certainly its focus on performance and syntactical improvement would seem to offer would-be adopters a carrot.

  • Rust: Another popular choice among language enthusiasts, Rust’s growth has outpaced slower growth languages like Elixir or Julia, but not by much. This time around, Rust moves up two spots from #48 to #46. The interesting question for Rust is when, or perhaps if, it will hit the proverbial tipping point, the critical mass at which usage becomes self-reinforcing and an engine for real growth. Go went through this, where its growth through the mid to low thirties was relatively modest, then picked up substantially until it entered the Top 20. In the meantime, it will have to settle for modest but steady gains quarter after quarter.

  • Swift: Swift’s meteoric rise has predictably slowed as it’s entered the Top 20, but importantly has not stopped. For this ranking, Swift moves up one spot from #18 to #17. As always, growth is more difficult the closer you get to the top, and in passing Matlab, Swift now finds itself a mere two spots behind Go – in spite of being five years younger. It is also three spots behind Scala and only four behind R. Which means that Swift finds itself ranked alongside languages of real popularity and traction, and is within hailing distance of our Tier 1 languages (R is the highest ranking Tier 2). The interesting thing is that Swift still has the potential to move significantly; its current traction was achieved in spite of being a relatively closed alternative amongst open source alternatives. Less than four weeks before we took this quarter’s snapshot of data, Swift was finally open sourced by Apple, which means that the full effect of this release won’t be felt until next quarter’s ranking. This release was important for developers, who typically advantage open source runtimes at the expense of proprietary alternatives, but also because it allows third parties to feel comfortable investing in the community in a way they would not for a proprietary stack – see IBM’s enthusiastic embrace of Swift. This means that Swift has, uniquely, multiple potential new engines for growth. So it will be interesting indeed to see what impact the release has on Swift overall adoption, and whether it can propel it near or actually into the Top 10.

  • Typescript: One interesting, although unheralded, language to watch is TypeScript. A relatively new first class citizen in the Microsoft world, this (open source, notably) superset of JavaScript is quietly moving up the ranks. In this ranking, TypeScript jumped two spots from #33 to #31, passing ASP in the process. Obviously it’s a small fraction of JavaScript’s traction, but the list of interesting technologies it outranks now is growing longer: ASP (#32), OCaml/TCL (#33), Cold Fusion/DART (#37), among others, as well as the aforementioned Elixir/Julia/Rust. It’s not reasonable to expect any explosive growth from Typescript, but it wouldn’t be surprising to see it get a bounce should it prove capable of moving into the twenties and becoming more widely visible. Regardless, it’s become a language to watch.

The Net

We are regularly asked why we don’t run the language rankings more regularly – every quarter or even on a monthly basis. The answer is that there isn’t enough movement in the data to justify it; programming languages are popular enough and the switching costs are sufficient to mean that significantly shifting adoption is a slow, gradual process. For every language language except Swift, anyway.

It will be interesting to see whether or not we’ll see new entrants into the Tier 1 of languages, with the most likely candidate at this point being Swift followed by Go. Further down the list, several interesting but currently niche languages are getting close to thresholds at which they have the potential to see substantial, if not guaranteed growth. Not the kind that will take them into the Top 10, but certainly there are vulnerable languages at the back end of the Top 20. In the meantime, we’ll keep checking in every other quarter to report back on progress – or the lack thereof – in any of the above areas.

One More Thing

(Added 2/22/2016)

Of all the requests we receive around our programming language rankings, the ability to browse the history of their performance is by far the most common. The current rank of a language is of great interest, of course, but for many previous rankings and the trajectory they imply are at least as interesting, and in some cases more so.

We’ve been thinking about this for a while, and while a number of different visualizations were assessed, there are so many different angles to the data a one size fits all approach was less than ideal. Which led to the evaluation a few dynamic alternatives, which were interesting but had some issues. Rather than hold up this quarter’s already delayed release for a visualization that had potential but might not work, then, we went ahead and published the rankings.

Over the weekend, however, the last major obstacles were addressed. It’s not perfect, but the historical depiction of the rankings is in a state where we can at least share a preliminary release. A few notes:

  • This is not a complete ranking of all the languages we survey. It includes only languages that are currently or have been at one time in the Top 20.
  • This graphic is interactive, and allows you to select as many or as few languages as you prefer. Just click on the language in the legend to toggle them on or off. This is helpful because Swift fundamentally breaks any visual depiction of growth: de-select it and the chart becomes much more readable.
  • The visualization here, courtesy of Ramnath Vaidyanathan’s rCharts package, is brand new and hasn’t been extensively tested. Mobile may or may not work, and given the hoops we had to jump through to host a D3-based visualization on a self-hosted WordPress instance, it’s likely that some browsers won’t support the visualization, HTTPS will break it, etc. We’ll work on all of that, and do let us know if you have problems, but we wanted to share at least the preliminary working copy as soon as we were able.

With that, we hope you enjoy this visual depiction of the Historical Programming Language Rankings.

Categories: Programming Languages.