tecosystems

Distributed Source Code Management – Niche or Trend?: The Q&A

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

If there’s one thing that continues to perplex me within the application development space, it’s the fact that there are few audiences for whom distributed source code management (DSCM [1]) systems is a subject of interest. Before you ask: yes, I’ve gone beyond the immediate circle of family and friends. Enterprisey types consider the systems too immature, too different and, yes, too “new” to be in any way useful, while for the open source folks I talk to they’re more or less taken for granted – just another part of the landscape.

The explanation for this otherwise inexplicable lack of attention is, to my way of thinking, pretty simple: source code management tools aren’t sexy. They’re not going to get (most) people excited in the way that, say, a web application framework like Rails is. Many developers, in fact, have probably spent a good deal of their careers cursing such tools, whether they be the longtime standard of CVS or the primitive, bastardized BIM-EDIT based version control system I was compelled to build for a client years ago as part of a CMM certification process (don’t ask).

But I’m a believer that those reactions bely the importance of the tools, and it’s something I’ve been meaning to write up for some time. Ergo, it’s time for another Q&A.

Q: Before proceeding, any relationships to disclose in this space?
A: The only one I can think of is that IBM, whose Rational division is a significant player in the source code management space, is a customer. I can’t think of any other client relationships that bear on the subject matter directly, although it’s perhaps worth mentioning that Sun – a recent adopter of DSCM – is a client. That’s it, off the top of my head.

Q: For the non-developers in the audience, can you explain briefly what a DSCM tool is and how it differs from the alternatives?
A: Sure. Wikipedia’s got a great definition for you here, but in layman’s terms it’s just a system that manages source code in a non-centralized fashion.

Primitive source code management tools demanded that developers work online, checking out each segment of code they wanted to work on, during which time it’s typically not available for writes by other developers. After their changes are completed, the code would be checked back in, at which point it would be available for others.

Decentralized tools operate according to different principals. With DSCMs, multiple parties can check out the same code, work on it offline, and submit their changes back. Conflicts between the individual changes are managed on a peer to peer basis, by the tool. There is no centralized control, in other words – hence the decentralized description.

Q: What are the advantages to this approach?
A: There are many. Generally speaking, it’s far more flexible for developers – particularly those working in distributed (geographically or otherwise) fashion. See, for example, Sam Ruby’s comments on Bazaar (bzr):

Just when I was just starting to get comfortable with svn, along comes bzr.

svn has a lot of benefits over cvs. For me, the biggest benefit is the increased ability to work on a plane I can do an svn delete or an svn diff while disconnected. When I land, I can simply do an svn update and an svn commit.

It looks like bzr intends to take this to the next level, I can work on multiple changes, create branches, and query the version history while offline…

This is where it gets mind-blowing. After I am done development on my laptop, I can then simply scp/rsync my entire directory structure up to intertwingly. I can then run from my directory. That’s not the surprising part. That’s entirely to be expected.

What’s cool is that I — or anybody else — can then do a bzr get against that directory to retrieve everything – including the version history. And I had to install nothing to make this work. It is all HTTP GET of statically served files. Sweet.

In lieu of posting patches to the development list, I can simply post a pointer to my repository, perhaps with some prose indicating why anybody might care. Maintainers of other repositories may chose to merge in a portion of my changes, or they may not.

This style of development, with a low barrier to entry, is not for everyone. In fact, I’m not yet sure that it is for me. But it certainly has got me thinking.

Those are the kind of comments you’ll see frequently from developers exposed to DSCMs.

Q: So DSCM’s are popular with the development crowd, and have a low barrier to entry – great. It’s still just a source code management tool, correct?
A: Well, with the caveat that they’re not popular with all developers, I’d say no – they have considerably more profound implications than just as an SCM. Much as the success or failure of an open source project can depend as much on the goverance structure as it can on the actual code, so too is the choice of a source code management tool far more important than many realize. DSCM’s, in fact, can have implications on goverance and vice versa, but that’s neither here nor there.

If you’re still skeptical that the SCM selection process plays an important role in the life of a project, Bill de hÓra’s thoughts might be of interest:

The conclusion I draw from this and my own experience having migrating my fair share of source trees is that the version control system is a first order effect on software, along with two others – the build system and the bugtracker. Those choices impact absolutely everything else. Things like IDEs, by comparison, don’t matter at all. Even choice of methodology might matter less.

Q: What is your personal experience with the tools?
A: Project-wise, nill. My coding days ended before these tools became popular; I was stuck using CVS or SourceSafe instead. From an evaluation perspective, I’ve used Bazaar and Mercurial, and Subversion is currently used to maintain all of our WordPress instances. I’m considering using one of the DSCMs to maintain my desktop after I migrate to the new machine, but that’s not decided yet.

Q: What kind of developer doesn’t go for these DSCM tools? What is the rationale for not caring for them?
A: Havoc’s probably as good a candidate to carry the centralized tool standard as any. One of his objections has to do with the rapid proliferation of different tools, each with its own syntax, which while a legitimate objection is non-specific to the subject matter. The more substantive issue is contained here:

I think what I don’t get yet is why you’d want to maintain a bunch of local changesets for very long. The Linux-kernel-style fork-fest seems just nuts for anything I’m working on. Usually there are less than 10 people, often less than 5, that are really doing much work, and any other contributors are sending a one-liner every few months. There’s just no problem that I see here that distributed version control would solve.

As Sam says above, the style of development permitted by DSCM’s is not for everyone – nor every project – and Havoc helps explain why. He also links to some other developers thoughts pro and con. Here are two bonus links comparing Mercurial to Subversion.

Q: So why are they a big deal? If it’s a niche trend amongst a couple of open source projects, why should mainstream developers care?
A: Mostly because it’s not a niche trend, it’s increasingly mainstream. I have yet to see accurate metrics polling developers for their preferences one way or another, but the increasing adoption of DSCM tools indicates to me that more developers prefer decentralized development than do not. With all due respect to Havoc, of course.

Q: What adoption are you referring to? Who is using these DSCMs today?
A: A couple of projects you may have heard of. Ubuntu, for example, leverages Bazaar. Linux and X.org are on Git. And Mercurial, an emerging center of gravity in the DSCM space, counts among its users the Mozilla project, IcedTea, OpenJDK, and OpenSolaris. The sheer weight and variety of these projects would be enough to convince me to pay attention, even absent some of the nice things developers say about them.

Q: So because these major projects are either using them or running them in tandem with centralized systems, everyone should just cut over, right?
A: Hardly. Migrating from one version control system to another is a decidedly non-trivial task, and the anticipated benefits need to be weighed against the expected costs – as always. For an idea of the complexity involved, listen to what some of the Mozilla folks have to say about the prospect of switching.

Q: If I’m starting a project, or have an existing project in which I think the benefits might potentially outweigh the costs, what tools should I consider?
A: There are a host of tools available to solve these needs, as Wikipedia can tell you. The ones that I hear the most chatter about are the three already mentioned: Bazaar, Git, and Mercurial. darcs and Monotone come up occasionally, but not frequently. If you’re interested, the folks from OpenSolaris kindly made available documentation from their detailed review: the background for the choice, the project’s requirements, and reports (note, however, the differing architectures measured on) on Bazaar (it’s worth nothing that performance, the number one complaint about bzr, is apparently being addressed), Git and Mercurial.

Q: Anything else to add?
A: Just that I think DSCMs should be on the radar of software projects both internal and external, closed or open source. They may not be appropriate for your project or codebase, but it’s difficult to tell if they aren’t being evaluated.

[1] DSCMs may also be referred to as Distributed Version Control Systems, or DVCS.

7 comments

  1. I’m surprised by how niche these still are. I’m starting some greenfield work, and we’re going to use Subversion; if it were my decision, I’d probably try out Mercurial.

    It’s not that I think we absolutely need the extra features, but they don’t “cost” much, and they have potential.

  2. Trivial point: Sun is not in fact a “recent adopter of DSCM” – it’s been at the heart of Sun’s development methodology for many years. In fact, the creator of BitKeeper was a former Sun employee adapting his earlier work at Sun (Teamware, still in use but subject to the migration to Mercurial of which you are aware).

  3. Hi!

    I am surprised you missed MySQL in your list. We have been using a distributed source control system since 2000. Our development process, which allows us to span multiple countries since all developers work from home, wouldn’t work without it.

    Cheers,
    -Brian

  4. I think give it a few years. Subversion is only really going mainstream as of 2007; it’ll be the last great centralised VCS.

    Conceptually how you work under a DVCS is different in a number of ways that will slow adoption.

    – Tools. There’s no IDE support for things like hg or bzr and they don’t hook up yet with other infastructure (like bugtrackers) as easily as cvs/p4/svn do. Frankly, I’ve no time for the idea that IDE support is a criterion for choosing something as important as VCS – it’s like picking a car for its door handles – but plenty of people don’t agree. For example, I think you can correlate Subversion’s adoption with Eclipse and IDEA having decent support for it.

    – Patches. Patches become the unit of work and of communication not “commits” or “modules” or “LOC”. People working in OSS will be familiar, but even then distributed tress. Until teams treat patch/merge as muscle memory DVCS it will feel wrong. Handling patches is a whole new skill to learn.

    – Accountability. A DVCS imo shines a hard light on /practices/, especially on managing multiple versions of the same code base. I’m not talking about methodology or governance here, but what people do in the field to get software out the door. IOW, if you have broken practices that lean towards forking, cut and paste, not splitting configuration from code, and finagling things on production, a DVCS will make that evident real fast.

    – Control. In the commercial space, I can see classically trained SCM and release managers being out of their comfort zone with a DVCS. Things will need to be done so they can “see” the “real” code and feel like they’re on top of things. Also anyone who grew with a locking VCS (this includes Perforce), code ownership and fear or branching, might find the idea of anyone being able to checkin anything and everyone having a tree unsettling and chaotic (the answer of course is to control which tree is deemed “gold”).

    Tools and patches management are easily solved. Accountability and control are much more contentious.

  5. [trackback gave me a 302 & didn’t seem to “take” so…

    http://blogs.sun.com/webmink/entry/a_niche_the_size_of

    I was intrigued to read what Stephen O’Grady recently published about distributed source code management (DSCM) in one of his signature Q&As. He makes a number of interesting observations, but I was interested by the omission of a whole thread of analysis around the assertion Mark Shuttleworth makes that merging is the key to source code management…]

  6. To follow up to Bill’s points… there is increasing adoption. NetBeans now has a Mercurial plugin, and Mercurial certainly supports the patch/merge model of working.

    I think perhaps one of the most important points that haven’t been made here in either you article or any of the comments is that any distributed SCM can act/behave/look-like a centralised SCM. Simply take one of the “distributed” repositories and canonicalise it as a “canonical” or central repository. This is not unlike how we operate in OpenSolaris where for the ON consolidation we have a canonical onnv-gate, which all ON putbacks go to.

  7. […] tracking decentralized version control tools for four or five years now. Three years ago, I was openly perplexed about the continuing lack of interest in the subject. In retrospect, however, the slow ascendancy […]

Leave a Reply to Stephen Lau Cancel reply

Your email address will not be published. Required fields are marked *