tecosystems

Distributed Source Code Management – Niche or Trend?: The Q&A

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

If there’s one thing that continues to perplex me within the application development space, it’s the fact that there are few audiences for whom distributed source code management (DSCM [1]) systems is a subject of interest. Before you ask: yes, I’ve gone beyond the immediate circle of family and friends. Enterprisey types consider the systems too immature, too different and, yes, too “new” to be in any way useful, while for the open source folks I talk to they’re more or less taken for granted – just another part of the landscape.

The explanation for this otherwise inexplicable lack of attention is, to my way of thinking, pretty simple: source code management tools aren’t sexy. They’re not going to get (most) people excited in the way that, say, a web application framework like Rails is. Many developers, in fact, have probably spent a good deal of their careers cursing such tools, whether they be the longtime standard of CVS or the primitive, bastardized BIM-EDIT based version control system I was compelled to build for a client years ago as part of a CMM certification process (don’t ask).

But I’m a believer that those reactions bely the importance of the tools, and it’s something I’ve been meaning to write up for some time. Ergo, it’s time for another Q&A.

Q: Before proceeding, any relationships to disclose in this space?
A: The only one I can think of is that IBM, whose Rational division is a significant player in the source code management space, is a customer. I can’t think of any other client relationships that bear on the subject matter directly, although it’s perhaps worth mentioning that Sun – a recent adopter of DSCM – is a client. That’s it, off the top of my head.

Q: For the non-developers in the audience, can you explain briefly what a DSCM tool is and how it differs from the alternatives?
A: Sure. Wikipedia’s got a great definition for you here, but in layman’s terms it’s just a system that manages source code in a non-centralized fashion.

Primitive source code management tools demanded that developers work online, checking out each segment of code they wanted to work on, during which time it’s typically not available for writes by other developers. After their changes are completed, the code would be checked back in, at which point it would be available for others.

Decentralized tools operate according to different principals. With DSCMs, multiple parties can check out the same code, work on it offline, and submit their changes back. Conflicts between the individual changes are managed on a peer to peer basis, by the tool. There is no centralized control, in other words – hence the decentralized description.

Q: What are the advantages to this approach?
A: There are many. Generally speaking, it’s far more flexible for developers – particularly those working in distributed (geographically or otherwise) fashion. See, for example, Sam Ruby’s comments on Bazaar (bzr):

Just when I was just starting to get comfortable with svn, along comes bzr.

svn has a lot of benefits over cvs. For me, the biggest benefit is the increased ability to work on a plane I can do an svn delete or an svn diff while disconnected. When I land, I can simply do an svn update and an svn commit.

It looks like bzr intends to take this to the next level, I can work on multiple changes, create branches, and query the version history while offline…

This is where it gets mind-blowing. After I am done development on my laptop, I can then simply scp/rsync my entire directory structure up to intertwingly. I can then run from my directory. That’s not the surprising part. That’s entirely to be expected.

What’s cool is that I — or anybody else — can then do a bzr get against that directory to retrieve everything – including the version history. And I had to install nothing to make this work. It is all HTTP GET of statically served files. Sweet.

In lieu of posting patches to the development list, I can simply post a pointer to my repository, perhaps with some prose indicating why anybody might care. Maintainers of other repositories may chose to merge in a portion of my changes, or they may not.

This style of development, with a low barrier to entry, is not for everyone. In fact, I’m not yet sure that it is for me. But it certainly has got me thinking.

Those are the kind of comments you’ll see frequently from developers exposed to DSCMs.

Q: So DSCM’s are popular with the development crowd, and have a low barrier to entry – great. It’s still just a source code management tool, correct?
A: Well, with the caveat that they’re not popular with all developers, I’d say no – they have considerably more profound implications than just as an SCM. Much as the success or failure of an open source project can depend as much on the goverance structure as it can on the actual code, so too is the choice of a source code management tool far more important than many realize. DSCM’s, in fact, can have implications on goverance and vice versa, but that’s neither here nor there.

If you’re still skeptical that the SCM selection process plays an important role in the life of a project, Bill de hÓra’s thoughts might be of interest:

The conclusion I draw from this and my own experience having migrating my fair share of source trees is that the version control system is a first order effect on software, along with two others – the build system and the bugtracker. Those choices impact absolutely everything else. Things like IDEs, by comparison, don’t matter at all. Even choice of methodology might matter less.

Q: What is your personal experience with the tools?
A: Project-wise, nill. My coding days ended before these tools became popular; I was stuck using CVS or SourceSafe instead. From an evaluation perspective, I’ve used Bazaar and Mercurial, and Subversion is currently used to maintain all of our WordPress instances. I’m considering using one of the DSCMs to maintain my desktop after I migrate to the new machine, but that’s not decided yet.

Q: What kind of developer doesn’t go for these DSCM tools? What is the rationale for not caring for them?
A: Havoc’s probably as good a candidate to carry the centralized tool standard as any. One of his objections has to do with the rapid proliferation of different tools, each with its own syntax, which while a legitimate objection is non-specific to the subject matter. The more substantive issue is contained here:

I think what I don’t get yet is why you’d want to maintain a bunch of local changesets for very long. The Linux-kernel-style fork-fest seems just nuts for anything I’m working on. Usually there are less than 10 people, often less than 5, that are really doing much work, and any other contributors are sending a one-liner every few months. There’s just no problem that I see here that distributed version control would solve.

As Sam says above, the style of development permitted by DSCM’s is not for everyone – nor every project – and Havoc helps explain why. He also links to some other developers thoughts pro and con. Here are two bonus links comparing Mercurial to Subversion.

Q: So why are they a big deal? If it’s a niche trend amongst a couple of open source projects, why should mainstream developers care?
A: Mostly because it’s not a niche trend, it’s increasingly mainstream. I have yet to see accurate metrics polling developers for their preferences one way or another, but the increasing adoption of DSCM tools indicates to me that more developers prefer decentralized development than do not. With all due respect to Havoc, of course.

Q: What adoption are you referring to? Who is using these DSCMs today?
A: A couple of projects you may have heard of. Ubuntu, for example, leverages Bazaar. Linux and X.org are on Git. And Mercurial, an emerging center of gravity in the DSCM space, counts among its users the Mozilla project, IcedTea, OpenJDK, and OpenSolaris. The sheer weight and variety of these projects would be enough to convince me to pay attention, even absent some of the nice things developers say about them.

Q: So because these major projects are either using them or running them in tandem with centralized systems, everyone should just cut over, right?
A: Hardly. Migrating from one version control system to another is a decidedly non-trivial task, and the anticipated benefits need to be weighed against the expected costs – as always. For an idea of the complexity involved, listen to what some of the Mozilla folks have to say about the prospect of switching.

Q: If I’m starting a project, or have an existing project in which I think the benefits might potentially outweigh the costs, what tools should I consider?
A: There are a host of tools available to solve these needs, as Wikipedia can tell you. The ones that I hear the most chatter about are the three already mentioned: Bazaar, Git, and Mercurial. darcs and Monotone come up occasionally, but not frequently. If you’re interested, the folks from OpenSolaris kindly made available documentation from their detailed review: the background for the choice, the project’s requirements, and reports (note, however, the differing architectures measured on) on Bazaar (it’s worth nothing that performance, the number one complaint about bzr, is apparently being addressed), Git and Mercurial.

Q: Anything else to add?
A: Just that I think DSCMs should be on the radar of software projects both internal and external, closed or open source. They may not be appropriate for your project or codebase, but it’s difficult to tell if they aren’t being evaluated.

[1] DSCMs may also be referred to as Distributed Version Control Systems, or DVCS.