tecosystems

Forking, The Future of Open Source, and Github

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

I cannot help you with your question, sir, for I do not understand it. It is a wrong question, sir.” – Susanna Clarke, Jonathan Strange & Mr. Norrell

Being what I should have replied to Mike Milinkovich last week, but didn’t think to. But let me back up.

Last Wednesday, at the kind invitation of the folks from Eclipse, I had the opportunity to sit with more august company – Justin Erenkrantz (Apache), Mårten Mickos (Eucalyptus), and Jason van Zyl (Maven/Sonatype) – on a panel charged with debating the future of open source. Among the questions posed to us was this: is the future of open source going to be based on communities such as Apache and Eclipse or will it be based on companies that sell open source?

My reply? Neither. It’s Github.

Intending no disrespect to either category, of course. Communities such as Apache, Eclipse and Mozilla are and will remain massive centers of gravity for open source projects. And commercial vendors – yes, even those that dare practice “open core” – will continue to inject revenue into the larger open source ecosystem, underwriting substantial portions of the costs of producing free and open source software as Justin said. But when we talk about foundations and vendors jointly and the mechanics of open source, what we’re really talking about is the past and the present. Not the future.

Foundations and vendors, irrespective of whether or not they’ve transitioned from centralized to decentralized version control systems, are essentially command and control development models. Developers, corporations and other contributors build towards a single project in a defined hierarchy, which a given potential contributor may or may not have the right to commit to. This single project is then either procured directly, or more typically consumed downstream via a community (e.g. Debian, Fedora) or commercial distribution (e.g. RHEL, Cloudera, etc). In terms of its development paradigm, open source development happens to look a lot like proprietary development. It just happens to occur in the open. The future, meanwhile, is almost certainly going to look different than the present. In part because the tooling encourages it.

We’ve been tracking decentralized version control tools for four or five years now. Three years ago, I was openly perplexed about the continuing lack of interest in the subject. In retrospect, however, the slow ascendancy of distributed version control systems (DVCS) should have been predicted. Not because we’re usually ahead of the curve when it comes to adoption; I’ve been waiting for NoSQL for five years. The lag in DVCS adoption should have been anticipated rather because it takes people – even (especially?) the really smart ones – to come around to the idea of decentralization. Witness Joel’s conversion:

I studied, and studied, and finally figured something out. Which I want to share with you.

With distributed version control, the distributed part is actually not the most interesting part.

The interesting part is that these systems think in terms of changes, not in terms of versions.

That’s a very zen-like thing to say, I know. Traditional version control thinks: OK, I have version 1. And now I have version 2. And now I have version 3.

And distributed version control thinks, I had nothing. And then I got these changes. And then I got these other changes.

It’s a different Program Model, so the user model has to change.

In Subversion, you might think, “bring my version up to date with the main version” or “go back to the previous version.”

In Mercurial, you think, “get me Jacob’s change set” or “let’s just forget that change set.”

If you come at Mercurial with a Subversion mindset, things will almost work, but when they don’t, you’ll be confused, unhappy, and unsuccessful, and you’ll hate Mercurial.

Whereas if you free your mind and reimagine version control, and grok the zen of the difference between thinking about managing the versions vs. thinking about managing the changes, you’ll become enlightened and happy and realize that this is the way version control was meant to work.

When we concluded that version control systems such as Git or Mercurial would become popular, if not the default, we had had no such epiphanies, no similar flashes of insight. It was simply brute force observation: whatever the reasoning, more and more developers, projects and firms were transitioning away from centralized to decentralized. And happier for it. The trendline was clear, which is why we weren’t exactly going out on a limb predicting the ascension of Git, Mercurial and their brethren.

What was less obvious was the profound, outsized impact decentralized version control would have on the future of open source, and thus development in general.

Open source development traditionally has been, for better and for worse, a social activity. With the accelerating adoption of tools like hosted Git, it’s even more so. Why? As counterintuitive as it might seem, the ability to fork.

Forking has historically an option of last resort; how developers proceed when all other remedies are exhausted. First because it’s potentially damaging to the originating project, but more because it’s a significant logistical challenge. As Brian Aker put it:

Forking software over small changes is for the most part unviable because of the cost of keeping a fork of the software up to date, but it is not impossible.

What if the costs of forking became negligible, for the originating project and those who wish to fork a project? What if tools such as decentralized version control made it possible to work on projects not in centralized check-in, check-out fashion, but individually?

As Brian discussed at OSCON back in 2008, all of a sudden forks become trivial; both to execute, and potentially to reintegrate. On Github, forking is quite literally pushbutton. In terms of their ability to permit greater creativity, forks cease being a cancer and become a cure. Sometimes, anyway. Because while it’s simple to fork, it’s not much harder to reintegrate. Losing the shackles of centralized development accelerates development and increases creativity. Add in a centrally hosted network model, and everything from discovery to social features become possible. As jwz once said, “these days, almost all software is social software.” Why should version control be any different?

The future to me, then, looks a lot more like Github than it does a foundation or vendor. It is becoming the breeding ground for thousands of innovations that may aspire to grow up to be full fledged foundation projects, commercial products, or both. So much so that a number of people, like Phil Wilson, worry about what would happen if Github went away. As they should: look at some of the projects hosted there.

Because while Github will never replace the foundations, let alone the vendors, it will increasingly become the foundations upon which many of their component projects are built. If you haven’t been paying attention to the service, then, I’d suggest giving it a look. It’s the shape of things to come.

Disclosure: Apache and Eclipse are RedMonk customers; Github and Mozilla are not.

18 comments

  1. Brilliant observations. A question: In your experience/observations do projects that live in DVCS environments tend to have better supporting build/test automation/recipes? The point of config mgmt is “do I know what software is in that executable unit” which was a little more complex than just what versions of what collections of modules. Verifying the build would seem to be almost more important in such a malleable world. Or am I completely missing the point?

  2. We build the future together, big or small. Here’s a tale of mine regarding forking: http://someguysblog.com/2010/02/13/leitmotif/ It even includes a comment from my mom!

  3. I agree with this in theory, but I think in practice, you’re still going to find that fairly strong project guidance will continue being important because even for Open Source, just throwing features on top isn’t that helpful and maintaining core architecture of any sophisticated component requires vision and control.

    Real, TRUE forks off of a project and into an honest to god separate project are still going to be divisive. The complimentary ones where a couple people button down and work on something are going to be much easier, but there’s always going to be the matter of pushing commits back to the master.

  4. Hi!

    I don’t think companies have even started to understand the model that software development is moving toward. Keeping vendor versions should now be much simpler then in the past, but I am hoping that more companies feel that it is easier to contribute back as well. I look at what has happened in the postgres world, and I wonder if the companies around that project will now feel that it is an easier path to get their code back in.

    On a related note there was a recent phone call that O’Reilly put together with a number of open source leads. It was amazing to hear how many folks on the call where terrified of how github has lowered the bar for forking. Their fear being a loss of patches. It was crazy to listen too.

    And thanks for quoting me!

    Cheers,
    -Brian

  5. What if Github went away question can be answered in a different way. Don’t use GitHub. Use Gitorosis http://gitorious.org/. The source code is under AGPLv3 and you can host your own private instance if you want that. No reliance on proprietary infrastructure.

  6. I think DVCSs are important, but you are missing a very important aspect of sharing open source software–once I (as an interested outsider) have the source code downloaded, I need to actually build it, run it, examine it, debug it, and deploy my own version of it. DVCSs don’t solve any of these issues. In the Java/JVM world, I’m finding myself extremely grateful when an FOSS project publishes a Maven POM for their project, because I can point my IDE at the POM and immediately build, run, examine and deploy that app or library. This is a big deal and an enormous time-saver. Custom build processes, project directory layouts, build artifact content all make getting involved with an FOSS project a hassle, and I say this after simply losing my enthusiasm for that extra effort after many, many years of just diving in and trying to work with someone else’s code. I think the “future” may include, for those who use IDEs (or even VIM/Emacs) pointing our tools at the project URL, launching a process which downloads and sets up everything I need so that, yes, I can fork and branch, but that I can also build and work on the thing.

    Note I’m not arguing that Maven is the solution, but rather that there is an important problem to be solved in allowing for easy collaboration.

    Patrick

  7. Open Cores are doing a lot of great things. They may not be useful today and you may not see the potential in it, but open hardware is a good thing which you have no reason to badmouth.

  8. […] Forking, The Future of Open Source, and Github Last Wednesday, at the kind invitation of the folks from Eclipse, I had the opportunity to sit with more august company – Justin Erenkrantz (Apache), Mårten Mickos (Eucalyptus), and Jason van Zyl (Maven/Sonatype) – on a panel charged with debating the future of open source. Among the questions posed to us was this: is the future of open source going to be based on communities such as Apache and Eclipse or will it be based on companies that sell open source? […]

  9. Stephen,

    Insightful post, as always. Balancing the needs of a ‘single source’ with a more distributed collaboration environment is something a lot of people will struggle with over the next couple of years. fyi, here is some insight on how the Eclipse community is implementing git: http://dev.eclipse.org/blogs/eclipsewebmaster/2010/04/01/git-vs-ip-provenance-dvcs-with-a-twist/

    Ian

  10. […] (by one group) feature will be minimal, decline to add it, or maintain a fork. With the latter less logistically expensive these days, perhaps that will become more viable an approach – even in commercial […]

  11. […] (by one group) feature will be minimal, decline to add it, or maintain a fork. With the latter less logistically expensive these days, perhaps that will become more viable an approach – even in commercial […]

  12. […] version control infrastructures such as Launchpad or, yes, Gitorius, actively encourage forking (coverage). They do so primarily by minimizing the logistical implications of creating and maintaining […]

  13. […] about forking when revision control packages like Git and Mercurial exist. Here’s an article that references these types of differences and why forking may not be such a big deal […]

  14. […] be the notable exception (possibly bc it came from FriendFeed). Some assets are hosted with Github (coverage), those that are not are typically housed at […]

  15. How has Github accelerated the tempo of software innovation?…

    “[At] the kind invitation of the folks from Eclipse, I had the opportunity to sit with more august company – Justin Erenkrantz (Apache), Mårten Mickos (Eucalyptus), and Jason van Zyl (Maven/Sonatype) – on a panel charged with debating the future of op…

  16. […] development platform there is (at least one well-respected software analyst has even called it “the future of open source”). Though Mercurial was deeply ingrained in our development process, we were willing to tolerate the […]

  17. […] peace has seemingly broken out in node-land. Perhaps counter-intuitively unless you’ve been reading RedMonk, that peace seems to be based on a fork of core technology. The recently created fork: […]

Leave a Reply

Your email address will not be published. Required fields are marked *