Back in August of last year, the folks over at IBM had occasion to release a whitepaper comparing Linux on POWER to Sun’s recently (at the time) open sourced Solaris 10 operating system. While there were many arguable points in the piece – as is common with just about any piece of vendor produced literature, I might add – it was the favorable comparison of a newly minted Linux facility called SystemTap to Solaris’ DTrace that really riled up the blogosphere.
First to take IBM to task on the subject was Robert Milkowski, who wrote a relatively detailed indictment of some of the claims made on behalf of SystemTap. This was then followed up by The Unix System Admin (AKA James Dickens), who wrote a series of entries on the top (1, 2, 3). And ultimately, a Friend of RedMonk dug up this email from a SuSE engineer which acknowledged that, as of a month earlier, Systemtap simply didn’t work very well.
In short, it appeared that Bruce Perens’ admission at LWE last year that Solaris provided “more value” than Linux applied, at least in part, to the technology that was DTrace. It appeared to have no equal.
And frankly, within the current kernel and commercial Linux releases, that’s still true. But as I wrote here, I wonder for how long. Sooner or later, it seemed to me, the Linux folks would respond to DTrace and introduce similar, if not equivalent, functionality. That is, in part, one of the reasons that I think Solaris is good for Linux and vice versa; it inspires competition, which benefits end users.
On Thursday, a couple of IBMers – Tom Curran, Michael Dolan, and Mike I-Didn’t-Catch-His-Last-Name – were kind enough to show me the latest iteration of Systemtap, and I have to say, it’s interesting. While I haven’t gotten the scripts to run it a bit on my machine yet, I got a preview and it seems that of the many criticisms Sun folks made of it, they’ve been addressed. Among them:
- “It looks like SystemTap can trace ONLY kernel functions:”
 SystemTap, as I understand it, now has the ability to probe both kernel and user spaces
- “I wonder how would SystemTap protect from null pointer dereferences and so on:”
 This is more complicated, but the SystemTap folks seem to have implemented a fairly granular permissions system which does prevent some of the security issues or simple user error behaviors that DTrace filters.
- “Well, SystemTap is lacking many of the features, is not well tested:”
 The IBMers yesterday freely admitted that some of the original criticism from the July/August timeframe was very justified; SystemTap was not ready for prime time. But the version I saw yesterday (on yet another KDE desktop, incidentally), and anticipate running myself shortly (more on that in a second), seemed usable and steady. Nothing broke, nothing time out, and there were no huge issues.
Does this prove that SystemTap is the equivalent of DTrace? Not at all. As I told the IBMers yesterday, what will convince me – as it did with DTrace – was real, live customers using it to their benefit – and on production machines. What I can run locally is interesting, but not relevant to the really difficult questions.
Anyhow, still undetermined for SystemTap is when/if a.) SystemTap makes its way into mainstream commercial and non-commercial distributions and b.) the required kernel modules, KProbes and RELAYFS (both of which I had to add to my kernel, necessitating two separate compiles) become standard. Until that happens, DTrace has an advantage over its Linux counterpart. If the popularity of Solaris’ dynamic tracing facility is any indication, however, SystemTap will be a package to watch.
And speaking of, let me compliment the IBM guys on their approach with respect to the demo. From reading this space, they knew that I was a Gentoo user, and used a Gentoo dev connection to get SystemTap into the Gentoo library so that I could install it with very little difficulty. If you want me to try your technology, make it easy for me: get it into the Gentoo/Debian/etc libraries.
Would love to hear more from either DTrace / SystemTap advocates on this subject, if you’ve used one and/or both. I’ll share my experiences as soon as I get a chance to play with it.
In the meantime, a suggestion for both sides: the ability to probe and view in kernel telemetry is obviously high value, but it’s valuable only to those that can use it. Both DTrace and SystemTap need to begin constructing tools on top of the tracing technologies to automate/simplify script generation and make reporting cleaner – Eclipse & NetBeans should be proactive here. Perhaps more important is the documentation question; both packages should have wikis where sysadmins and other users can share and exchange scripts.
DTrace and SystemTap will ultimately prove to be as successful as they are used, so anything to reduce the barriers to entry for all the sysadmins and – longer term – developers is a good thing.
