Dalibor Topic was the one to give me this idea, though I’m not sure if he’d remember the tweet. He was, however, the one who pointed me at MarkMail‘s archive of open source list traffic, which I’d seen before, using a by domain constraint, which I hadn’t. The idea is simple: MarkMail maintains a searchable index of the mailing lists for a number of open source projects (these, specifically). As a means of demonstrating the value of its MarkLogic Server, it parses the individual messages into XML and renders them queryable according to specific dimensions.
Given this ability, I thought it would be interesting to see what we might learn by examining – and in some cases comparing – general participation by domain. While this is not intended to be a comprehensive or authoritative statement on actual levels of engagement, the datapoints are interesting if nothing else. You can replicate these queries at MarkMail yourself using the “type:development from:domain.com” syntax. I’ve chosen development rather than commits because I wanted a broader sense of engagement than putbacks, but the latter would be interesting to study as well.
Before proceeding, a few caveats:
- MarkMail is currently indexing 8,146 sources. This is clearly not all of the open source mailing lists, so the picture is incomplete. It’s been a little while since their blog has been updated, as well.
- As has been documented in other discussions, such as measurement of contributions to the Linux kernel, many developers – though employed by a given entity – may prefer to use their own email address for on-list communications. Which obviously breaks the graphs below. Nor is the given domain likely to be inclusive of all of a given company’s employees.
- Not so much a caveat as an FYI, the graphs below aren’t normalized, as you’ll be able to tell quickly. Nor does MarkMail, as far as I can tell, expose the dataset for external processing. So pay close attention to the Y axis in all of the graphs.
Anyway, on to the data. Let’s look at how the big boys compare first.
HP vs IBM
Here’s HP:
And here’s IBM:
As might be predicted, IBM’s measurable list traffic exceeds HP’s. The level of that disparity is a bit of a surprise, but the most notable feature of both graphs is the general downward trending of participation beginning in 2009. The timing begs the question: has large system provider participation in open source been negatively impacted by the recession? We can’t answer that with this dataset, unfortunately.
One question we can attempt to answer with the available data: how has the acquisition of Sun by Oracle affected the participation of both companies in open source?
Oracle & Sun
Here’s Sun:
And here’s Oracle:
The answer, according to this dataset, is that while Oracle’s level of participation in open source communities spiked following the acquisition, it fails to replicate Sun’s performance as an independent. Which was, to be fair, among the highest participation observed: Oracle’s list activity, post-Sun, exceeds IBM’s at present, where it fell short up through 2009.
Moving on from the large systems vendors, what does the participation of Linux vendors look like?
The Linux Vendors
Here’s Canonical:
And here’s Novell:
Last, but obviously not least, Red Hat:
Red Hat dominates, as expected. With a broader portfolio of open source middleware in addition to the operating system, Red Hat’s observable participation is among the higest in the industry. What I didn’t anticipate, however, was the decline from Novell nor that they would be eclipsed by Canonical. It’s important to keep this in perspective, of course: the number of messages does not equate to the number of contributions to the kernel, for example (the Linux Foundation has detailed that here). Still, the above is worth some thought.
How about some notable proprietary vendors? What does their participation look like?
Microsoft & VMware
Here’s Microsoft:
And here’s VMware:
The peaks and valleys, particularly for Microsoft, are a bit curious, but otherwise, there’s not much to be seen here. Minimal involvement, as expected.
Hardware players?
AMD vs. Intel
Here’s AMD:
And here’s Intel:
While the disparity between the levels of participation here are worth noting, the most important aspect of the above graphs, to me, is the trendline. Participation is escalating significantly, which is indicative, perhaps, both of the growing importance of open source both within customers and the firms themselves.
What of the internet firms, who have been heavily dependent on open source, historically?
The Internet Firms
Here’s Facebook:
Google:
And Twitter:
With the exception of the slight tail to Google’s participation, I’m not sure there’s much to be extracted here. Neither Facebook nor Twitter have thousands of employees, so the volume is not terribly indicative. It is marginally interesting, however, that Twitter’s Y axis indicates a level up from Facebook.
One last question: it’s often been remarked that open source developers, in spite of being passionate about that software, are heavily dependent on proprietary webmail systems, principally GMail. So for our last question, let’s look at that.
The Webmail Providers
Here’s Gmail:
Hotmail:
And Yahoo Mail:
Several things jump out. First, Hotmail and Yahoo Mail have been flat to declining for a minimum of two years. Second, Gmail’s trajectory up until that time was sustained growth; since, it has also plateaued. Last, Gmail is massively more popular than either; than both combined, actually. Gmail, in point of fact, is the most popular single domain in this study. It seems plausible, therefore, that it is in fact true that a Gmail address is in fact the address of choice for the open source population.
Disclosure: of the mentioned companies, RedMonk clients include IBM, Microsoft, and Red Hat.
Eduardo Pelegri-Llopart says:
October 5, 2010 at 3:47 pm
Most folks have multiple addresses. Which they use depending on the corporate culture, and other circumstances.
For example, I used to post using my @sun.com at Java.Net but when we got acquired, I switched to using my @dev.java.net account.
As another example, the Apache culture is focused on individuals, and most contributors there will use their @apache.org address.
BTW, I fully agree about the value of MarkMail; we have been using it for several years, see http://blogs.sun.com/theaquarium/tags/adoption for some examples.
Ian Skerrett says:
October 5, 2010 at 4:21 pm
I am really surprised by the HP numbers. HP is the largest IT company, uses a lot open source but appears to have limited participation. Kind of sad to see.
jrep says:
October 5, 2010 at 4:26 pm
More than culture: some of these companies have, or have had in the past, explicit policies that employees should contribute from a non-company address.
Too bad; the notion was attractive.
Donnie Berkholz says:
October 5, 2010 at 4:31 pm
I think you’re stretching a little with the GMail conclusion. I would believe it more if I could see it as a percentage of total posts. Another thing worth mentioning is that GMail allows posting from arbitrary non-gmail.com addresses, so it might be better to check something besides the From: header.
Kim Moir says:
October 5, 2010 at 4:41 pm
Thanks, interesting numbers! Another metric that would be valuable would be the number of comments/patches by company in bug tracking systems. I know that as an Eclipse committer, I have most of my discussions within Bugzilla, not mailing lists. I would echo the comments regarding gmail addresses. I know several committers who use gmail to manage their open source mail traffic simply because it has good filtering tools. My employer encourages us to use our corporate email addresses to highlight the contribution we make to open source.
Gen Kanai says:
October 5, 2010 at 10:44 pm
Does MarkMail cover the Mozilla newsgroups? The Mozilla newsgroups are all mirrored to Google Groups, fwiw.
Patrick Finch says:
October 7, 2010 at 10:46 am
Fascinating idea.
I have also heard anecdotes of some very large companies using a handful of developers to interface with a larger community and make contributions on behalf of many more developers (I believe I heard this about Webkit specifically).
links for 2010-10-06 « Wild Webmink says:
October 6, 2010 at 8:07 am
[…] Evaluating Open Source Participation by Email Traffic Useful charts from O'Grady. Analysing e-mail like this has been a valuable trend indicator for a long time. I'm especially interested in how Sun's open source involvement grew after I started as COSO (tags: OpenSource FOSS Community Participation Developer EMail) […]
Adam Williamson says:
October 6, 2010 at 11:26 am
“More than culture: some of these companies have, or have had in the past, explicit policies that employees should contribute from a non-company address.”
In a sense, though, that makes the numbers ‘accurate’, because in that situation it seems reasonable to consider the contribution as having been a private volunteer effort by someone who just happens to work for a company, not a contribution from that company. Presumably companies who have such a policy are ones who do not encourage or see value in contributions to open source projects, so it would not be right to ‘credit’ the company with that contribution…
Eduardo Pelegri-Llopart says:
October 7, 2010 at 3:29 pm
That’s not true for the cases I had in mind, Adam. I was thinking of two scenarios.
* At apache the culture is to use @apache.org addresses. That was the case for Sun and IBM guys.
* In some companies, the company encourage(/mandate?) contributions from their employees from either apache.org or eclipse.org addresses to foster a sense of “the community interest is first; the company is second”.
Sun didn’t have a mandate, but it was certainly OK to use either address. Ian was referring to the second case.
Finally, since in many (most?) OSS forges you need to use their ID anyhow, and they have email addresses, it is may just be more convenient to configure your email client to route all the accounts into there. That’s what I do.
Links 7/10/2010: Linux 2.6.36 RC7, More Android Tablets | Techrights says:
October 7, 2010 at 3:06 am
[…] Evaluating Open Source Participation by Email Traffic […]
Ed Chi says:
October 11, 2010 at 12:43 am
Surely better analysis can be done. This is just by domain, but no analysis of centrality or content.
pinboard October 11, 2010 — arghh.net says:
October 11, 2010 at 2:25 pm
[…] Evaluating Open Source Participation by Email Traffic – tecosystems [email protected] analyzes #opensource participation by big companies using email as a proxy: […]
Christopher Oezbek says:
October 11, 2010 at 5:18 pm
Shameless pitch for some quantitative analysis of Open Source email traffic done for my Ph.D. thesis:
The Onion has Cancer: Some Network Visualizations of Open Source Email Communication.
Open Trends nieuwsselectie 12 October 2010 » Jan Stedehouder says:
October 12, 2010 at 10:36 am
[…] Evaluating Open Source Participation by Email Traffic – tecosystems […]
Jef Spaleta says:
October 12, 2010 at 1:10 pm
This is potentially misleading because there is a proclivity among venders to be heavily involved in the projects they themselves manage. Some vendors have more transparent development process for their pet projects than others so it skews the activity curve a little.
If you want to look at collaboration and participation outside of single vendor gardens you need to exclude lists from domains that a corporate entity hosts for internal projects.
Example:
Red hat involvement _not_ on a list hosted at redhat.com nor fedoraproject.org nor jboss.org
type:development from:redhat.com -list:org.jboss -list:com.redhat -list:org.fedoraproject
Canonical involvement _not_ on a list hosted hosted at canonical.com or ubuntu.com
type:development from:canonical.com -list:com.ubuntu -list:com.canonical
In these two cases the trends stand up, though for both vendors you see that the numbers take a hit with 30% to 50% of the communication being internal project communication.
-jef
¿Qué empresas contribuyen más al software libre? « Mbpfernand0's Blog says:
October 18, 2010 at 5:19 pm
[…] o tienen sus propias dinámicas, …) podemos aprender algunas cosas. En este caso en Evaluating Open Source Participation by Email Traffic nos hablan de una recopilación de listas de correo de proyectos de software libre y un análisis […]
La partecipazione all’open source in base al traffico email! « GNUpress! says:
October 19, 2010 at 1:25 pm
[…] comunque imparare ed elaborare informazioni molto utili. Questo è il caso dell’articolo su redmonk che tratta del traffico di posta elettronica di aziende impegnate su progetti del […]
Una medida particular de la participación en proyectos de Software Libre | NotiGeek says:
October 21, 2010 at 7:00 pm
[…] Vía | mbpfernand0 Gráficas | RedMonk […]
Una medida particular de la participación en proyectos de Software Libre | SOLO INFORMATICA, POR MANUEL MURILLO GARCIA says:
October 22, 2010 at 2:59 am
[…] Vía | mbpfernand0 Gráficas | RedMonk […]
Una medida particular de la participación en proyectos de Software Libre says:
October 22, 2010 at 10:02 am
[…] Vía | mbpfernand0 Gráficas | RedMonk […]
Una medida particular de la participación en proyectos de Software Libre says:
October 23, 2010 at 6:40 am
[…] la siguiente puede tener una escala de 1.000 en 1.000 emails al mes.Vía | mbpfernand0 Gráficas | RedMonkLeave a ReplyClic para cancelar respuesta.Username (required) […]
Una medida particular de la participación en proyectos de Software Libre says:
November 26, 2010 at 5:17 am
[…] | mbpfernand0 Gráficas | RedMonk 5 me […]