Dalibor Topic was the one to give me this idea, though I’m not sure if he’d remember the tweet. He was, however, the one who pointed me at MarkMail‘s archive of open source list traffic, which I’d seen before, using a by domain constraint, which I hadn’t. The idea is simple: MarkMail maintains a searchable index of the mailing lists for a number of open source projects (these, specifically). As a means of demonstrating the value of its MarkLogic Server, it parses the individual messages into XML and renders them queryable according to specific dimensions.
Given this ability, I thought it would be interesting to see what we might learn by examining – and in some cases comparing – general participation by domain. While this is not intended to be a comprehensive or authoritative statement on actual levels of engagement, the datapoints are interesting if nothing else. You can replicate these queries at MarkMail yourself using the “type:development from:domain.com” syntax. I’ve chosen development rather than commits because I wanted a broader sense of engagement than putbacks, but the latter would be interesting to study as well.
Before proceeding, a few caveats:
- MarkMail is currently indexing 8,146 sources. This is clearly not all of the open source mailing lists, so the picture is incomplete. It’s been a little while since their blog has been updated, as well.
- As has been documented in other discussions, such as measurement of contributions to the Linux kernel, many developers – though employed by a given entity – may prefer to use their own email address for on-list communications. Which obviously breaks the graphs below. Nor is the given domain likely to be inclusive of all of a given company’s employees.
- Not so much a caveat as an FYI, the graphs below aren’t normalized, as you’ll be able to tell quickly. Nor does MarkMail, as far as I can tell, expose the dataset for external processing. So pay close attention to the Y axis in all of the graphs.
Anyway, on to the data. Let’s look at how the big boys compare first.
HP vs IBM
And here’s IBM:
As might be predicted, IBM’s measurable list traffic exceeds HP’s. The level of that disparity is a bit of a surprise, but the most notable feature of both graphs is the general downward trending of participation beginning in 2009. The timing begs the question: has large system provider participation in open source been negatively impacted by the recession? We can’t answer that with this dataset, unfortunately.
One question we can attempt to answer with the available data: how has the acquisition of Sun by Oracle affected the participation of both companies in open source?
Oracle & Sun
And here’s Oracle:
The answer, according to this dataset, is that while Oracle’s level of participation in open source communities spiked following the acquisition, it fails to replicate Sun’s performance as an independent. Which was, to be fair, among the highest participation observed: Oracle’s list activity, post-Sun, exceeds IBM’s at present, where it fell short up through 2009.
Moving on from the large systems vendors, what does the participation of Linux vendors look like?
The Linux Vendors
And here’s Novell:
Last, but obviously not least, Red Hat:
Red Hat dominates, as expected. With a broader portfolio of open source middleware in addition to the operating system, Red Hat’s observable participation is among the higest in the industry. What I didn’t anticipate, however, was the decline from Novell nor that they would be eclipsed by Canonical. It’s important to keep this in perspective, of course: the number of messages does not equate to the number of contributions to the kernel, for example (the Linux Foundation has detailed that here). Still, the above is worth some thought.
How about some notable proprietary vendors? What does their participation look like?
Microsoft & VMware
AMD vs. Intel
And here’s Intel:
While the disparity between the levels of participation here are worth noting, the most important aspect of the above graphs, to me, is the trendline. Participation is escalating significantly, which is indicative, perhaps, both of the growing importance of open source both within customers and the firms themselves.
What of the internet firms, who have been heavily dependent on open source, historically?
The Internet Firms
With the exception of the slight tail to Google’s participation, I’m not sure there’s much to be extracted here. Neither Facebook nor Twitter have thousands of employees, so the volume is not terribly indicative. It is marginally interesting, however, that Twitter’s Y axis indicates a level up from Facebook.
One last question: it’s often been remarked that open source developers, in spite of being passionate about that software, are heavily dependent on proprietary webmail systems, principally GMail. So for our last question, let’s look at that.
The Webmail Providers
And Yahoo Mail:
Several things jump out. First, Hotmail and Yahoo Mail have been flat to declining for a minimum of two years. Second, Gmail’s trajectory up until that time was sustained growth; since, it has also plateaued. Last, Gmail is massively more popular than either; than both combined, actually. Gmail, in point of fact, is the most popular single domain in this study. It seems plausible, therefore, that it is in fact true that a Gmail address is in fact the address of choice for the open source population.
Disclosure: of the mentioned companies, RedMonk clients include IBM, Microsoft, and Red Hat.