Linux Responds to DTrace? SystemTap on Tap

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

Back in August of last year, the folks over at IBM had occasion to release a whitepaper comparing Linux on POWER to Sun’s recently (at the time) open sourced Solaris 10 operating system. While there were many arguable points in the piece – as is common with just about any piece of vendor produced literature, I might add – it was the favorable comparison of a newly minted Linux facility called SystemTap to Solaris’ DTrace that really riled up the blogosphere.

First to take IBM to task on the subject was Robert Milkowski, who wrote a relatively detailed indictment of some of the claims made on behalf of SystemTap. This was then followed up by The Unix System Admin (AKA James Dickens), who wrote a series of entries on the top (1, 2, 3). And ultimately, a Friend of RedMonk dug up this email from a SuSE engineer which acknowledged that, as of a month earlier, Systemtap simply didn’t work very well.

In short, it appeared that Bruce Perens’ admission at LWE last year that Solaris provided “more value” than Linux applied, at least in part, to the technology that was DTrace. It appeared to have no equal.

And frankly, within the current kernel and commercial Linux releases, that’s still true. But as I wrote here, I wonder for how long. Sooner or later, it seemed to me, the Linux folks would respond to DTrace and introduce similar, if not equivalent, functionality. That is, in part, one of the reasons that I think Solaris is good for Linux and vice versa; it inspires competition, which benefits end users.

On Thursday, a couple of IBMers – Tom Curran, Michael Dolan, and Mike I-Didn’t-Catch-His-Last-Name – were kind enough to show me the latest iteration of Systemtap, and I have to say, it’s interesting. While I haven’t gotten the scripts to run it a bit on my machine yet, I got a preview and it seems that of the many criticisms Sun folks made of it, they’ve been addressed. Among them:

  • “It looks like SystemTap can trace ONLY kernel functions:”
    SystemTap, as I understand it, now has the ability to probe both kernel and user spaces

  • “I wonder how would SystemTap protect from null pointer dereferences and so on:”
    This is more complicated, but the SystemTap folks seem to have implemented a fairly granular permissions system which does prevent some of the security issues or simple user error behaviors that DTrace filters.

  • “Well, SystemTap is lacking many of the features, is not well tested:”
    The IBMers yesterday freely admitted that some of the original criticism from the July/August timeframe was very justified; SystemTap was not ready for prime time. But the version I saw yesterday (on yet another KDE desktop, incidentally), and anticipate running myself shortly (more on that in a second), seemed usable and steady. Nothing broke, nothing time out, and there were no huge issues.

Does this prove that SystemTap is the equivalent of DTrace? Not at all. As I told the IBMers yesterday, what will convince me – as it did with DTrace – was real, live customers using it to their benefit – and on production machines. What I can run locally is interesting, but not relevant to the really difficult questions.

Anyhow, still undetermined for SystemTap is when/if a.) SystemTap makes its way into mainstream commercial and non-commercial distributions and b.) the required kernel modules, KProbes and RELAYFS (both of which I had to add to my kernel, necessitating two separate compiles) become standard. Until that happens, DTrace has an advantage over its Linux counterpart. If the popularity of Solaris’ dynamic tracing facility is any indication, however, SystemTap will be a package to watch.

And speaking of, let me compliment the IBM guys on their approach with respect to the demo. From reading this space, they knew that I was a Gentoo user, and used a Gentoo dev connection to get SystemTap into the Gentoo library so that I could install it with very little difficulty. If you want me to try your technology, make it easy for me: get it into the Gentoo/Debian/etc libraries.

Would love to hear more from either DTrace / SystemTap advocates on this subject, if you’ve used one and/or both. I’ll share my experiences as soon as I get a chance to play with it.

In the meantime, a suggestion for both sides: the ability to probe and view in kernel telemetry is obviously high value, but it’s valuable only to those that can use it. Both DTrace and SystemTap need to begin constructing tools on top of the tracing technologies to automate/simplify script generation and make reporting cleaner – Eclipse & NetBeans should be proactive here. Perhaps more important is the documentation question; both packages should have wikis where sysadmins and other users can share and exchange scripts.

DTrace and SystemTap will ultimately prove to be as successful as they are used, so anything to reduce the barriers to entry for all the sysadmins and – longer term – developers is a good thing.


  1. There are several ways to compare DTrace and SystemTap. One clear way is to compare features between the two. If one does this, one finds many critical features in DTrace that have no analogue in SystemTap — features like thread-local variables, scalable aggregations, speculative tracing, anonymous tracing, the ability to instrument dynamic environments like PHP, Perl, Python, Java, etc. But that kind of comparison tells a pretty superficial story — it implies that DTrace is simply “ahead” of SystemTap. In reality, the two differ at a more profound, architectural level. This difference is most stark around the most critical constraint in DTrace: safety. As I described at length, DTrace is safe by its architecture, not just its implementation. By contrast, SystemTap has not considered safety to be an architectural constraint. As a result, SystemTap has bugs like this one filed just yesterday — bugs that are actually more design defects than they are implementation flaws. Indeed, to this point: as recently as last week, attempts to instrument every function in the kernel with SystemTap induce fatal system failure.

    But safety is not the only architectural difference: there is also a fundamental difference around user-level instrumentation. Adam has described the DTrace methodology at length, but the key points are that it is safe, that it is unified across user-level and the kernel, and that it is loss-less, even on multithreaded applications running on multiprocessors. While the user-level instrumentation for SystemTap is nascent, it does not appear to be abide by these constraints, as revealed in a recent post to the SystemTap mailing list. Upshot: SystemTap is a long way from being able to do something like this with PHP or like this with Java.

    More generally, to understand the genesis and constraints of the DTrace architecture, one should read my recent article in ACM Queue; a point-by-point comparison with SystemTap will reveal the degree to which the approaches differ. But as you said in your entry, the proof is in the pudding. The most gratifying thing about DTrace is that it’s used every day to solve honest-to-God problems that couldn’t be solved any other way — or nowhere near as quickly, anyway. If SystemTap enjoys the same success, good for them and good for Linux users — but based on some of the architectural differences between the two, I would not view the success of DTrace as in any way guaranteeing the success of SystemTap.

  2. I am humbled by Bryan’s attention to our little project, even if it is focused just on the dirty laundry that unavoidably appears in young, wide-open projects like ours.

    I would dispute his assessment of “profound architectural level” differences in terms of safety, but surely a weblog is not the right place to have a detailed and thoughtful debate. Why don’t y’all join us on the project mailing list where we can hash it out.

  3. G’Day. I can share a DTrace point-of-view, having written DTrace documentation, hundreds of scripts, the DTraceToolkit, etc. I’m also an OpenSolaris volunteer, not a Sun-badged employee.

    To start with, there has been much marketing hype about DTrace, and (suprisingly) it is quite well justified. DTrace has changed the way I think about OSes, and raised the bar for what I expect to be able to measure and achieve. If a DTrace-like framework is written for other OSes, it will also be immensely valuable, so long as saftey is carefully respected.

    In previous lives as a system administrator, there were times when I really needed to measure something, and was suprised that the system couldn’t do so – or do so easily. DTrace solved all those long-term limitations in one blow. Not only that, but it has provided the means for me to measure just about any behaviour imaginable – so long as I really want to (and will sit down and figure out the code). Other OSes now feel absurdly restricted to their own suite of status tools.

    I agree that documentation is an important question – and that people need to be able to use this framework for the value to be realized. For some people that involves writing DTrace scripts – especially people with a heavy programming background.

    I wrote the DTraceToolkit for those who don’t have the time or the background to sit down and hammer out DTrace scripts themselves. It’s giving people instant value from DTrace, providing useful information that was previously not available. I’m also working on the documentation to go with the toolkit – which I’m a believer is an integral part for success (the toolkit is really 3 parts: 1. the scripts; 2. the man pages; 3. the example documentation).

    The DTraceToolkit can be found at: http://www.opensolaris.org/os/community/dtrace/dtracetoolkit.

    Other documentation sources for DTrace include the DTrace Guide – which is online and an excellent reference; and the upcoming release of “Solaris Internals” 2nd edition, which contains many useful DTrace scripts, demonstrated and explained. (I also helped write the book: for info see Richard’s blog).

    It’s difficult to comprehend the effect DTrace will have on operating system analysis; although I’ve written hundreds of scripts, I feel I’ve only scratched the surface. I’m going to be quite busy in the coming years.

  4. Hey,

    I will not start a technical debate between DTrace and Systemtap. Not at all. I would add two, three points about what we have right now in DTrace and how our community works.

    We have a nice collection of D scripts, the DTraceToolkit , what Brendan mentioned previously. He has created a nice toolkit which contains over 100 scripts based on DTrace which are tested and documented. We are working to enhance the toolkit and better document the scripts for our user base. The scripts can be very easily used then in any companies around: telcos or banks to simplify the use of DTrace.

    The DTrace community is a big team I would say: we have engineers working on DTrace internals, we have folks which are writing tools to easy the administration based on D scripts and we have people which document and test all these things.

    I think SystemTap is not here…maybe later. And when they will get to this point should I think is this a progress ?

  5. I am a Sun badged employee, and I use dtrace every day. As Brendan stated above, Dtrace has changed the way I approach operating systems. I use it virtually everyday in the course of my job, for instance, as I documented here.
    However, the approach used by Dtrace and SystemTap is different, and it remains to be seen which will win out in the marketplace of ideas. We at Sun view the safety of Dtrace to be an important engineering constraint, but it is one that gets in the way some things that people wish to do with Dtrace.
    For instance, recently it was asked if Dtrace could be used to implement a syscall interposer to create a security product. The answer is no, it can not, primarily due to the contraints of the safety requirement. Most of us thought that this was a good thing, since allowing this kind of feature could break the OS in new and exciting ways, producing an unstable environement.
    But from the point of view of the person asking, this is merely preventing them from doing what they wish to do. If SystemTap allows it, then this type of feature will be available in Linux and not in Solaris.
    Think of it like building a bridge without railing. It sounds like a terrible idea, but if your bridge is going to be used by more BASE jumpers than people trying to get across, it might be a good idea.
    We’ll just have to wait and see. Personally, I think railing is a good idea on a bridge.

  6. I have a small amount of practical experience with DTrace, but I’ve only read about SystemTap, so I can’t compare the two. I will, however, give an example where I’ve used DTrace to gain some impressive performance improvements, and then I’ll state the major problem with using SystemTap in our environment.

    We recently load-tested a Sun T2000 server in our environment. (This is the Niagara-based server, 8 cores with 4 threads per core.) Our first test was on an in-house multi-threaded SMTP server. Where we expected to see 6-8 times performance over our current hardware, we barely saw 3x, even though the server was still mostly idle. The obvious thought was that it was a lock contention problem, and the most likely candidate was the mutex around the shared lock file. The standard procedure would have been to increase the debugging level, log a lot of information, and analyze it afterwards. Given that the log file was a candidate for lock contention, the results of that analysis might not have been conclusive, because the extra logging itself could have masked the problem.

    I performed a very simple analysis with two DTrace scripts. The first told me what function the application was spending most of its time in (__lwp_mutex_timedlock(), for the curious), and the second gave me the most common stack traces leading to that particular function. The lock contention ended up being in libc’s implementation of malloc() and free(). It hadn’t occurred to anyone to even think about this as a possibility, and the standard method of adding extra logging probably wouldn’t have caught this. (For the curious, Solaris 10’s libumem provided the silver-bullet solution to the performance problem, letting the Niagara perform as expected but also getting us a 70% improvement on the existing hardware.)

    There’s a second case with a similar analysis that pointed to lock contention in readdir_r(), but the details would essentially be redundant. What is important to point out, though, is that we were performing this analysis on live production systems under real conditions. We were comfortable doing so given the safety features of DTrace, and doing so allowed us to spend a few hours analyzing the problem instead of spending much longer setting up a lab environment in which we could generate an equivalent amount of traffic.

    As for SystemTap, there’s a major sticking point with respect to using it in our environment (which is a mixture of Solaris and Linux.) As far as I understand, SystemTap involves generating C code and compiling that C code into a kernel module. Given the requirement for a C compiler, we’ll never be able to use SystemTap on production systems, as we do not install a C compiler on our production systems. It’s feasible that we would change that policy if SystemTap could provide some truly outstanding benefit, but it’s far more likely that we would run Solaris x86 in order to use DTrace. While there are architectural arguments to be made with respect to this issue, this is an operations argument that I would expect to be an issue in many environments.

  7. SystemTap made a conscious design decision to provide flexible and safe architecture that allows probes almost anywhere in the system. It is easier to provide a safe set of probes if the probe set is limited. The limitation of this approach is, if a problem requires a probe point which is not in that limited set then the solution becomes limited. SystemTap flexible architecture allows packaging of well tested set of probes using tapsets that are safe to use in any production system but it doesn’t limit one’s ability to add new set of probes when in need. Safety is an integral part of SystemTap design as you can read in my OLS paper .

    In a short time span SystemTap is able to provide significant functionality not by accident but due to it’s flexible and extensible architecture. We are doing extensive testing to disallow probes only in a very few places where it is not safe, as per our design goal. The bugs Bryan mentioned in his comment are not architectural flaws but our effort to isolate areas where probes are not allowed to the smallest possible set.

    I couldn’t agree more with Stephen that the value to the customer is in providing tools on top of the infrastructure. I think flexible architecture of SystemTap that has ability to develop higher level tools by including the lower level tapsets has an advantage. I also agree with Stephen that adoption of these tools and technologies in the customer base can be accelerated with automated script generation using a GUI front end.

  8. Show off your bugs

    I have seen lots of people using DTrace fixing problems. Systemtap has come a long way in the last year; I’m wondering has anybody solved a problem or tuned an application, found bottlenecks with Systemtap. So I’m putting out a request for people using Systemtap to show how Systemtap made there life easier, blog it or post it somewhere and a link to the systemtap mailing list. These could be very motivational to the community, perhaps they could even be made part of the official webpage.

  9. > but surely a weblog is not the right place to have a detailed and
    > thoughtful debate. Why don’t y’all join us on the project mailing
    > list where we can hash it out[?]

    Strongly disagree. Debate about the relative differences of DTrace and SystemTap should take place not on one list or the other, but a separate (newly created if need be) higher-level list somewhere — e.g. observability tools.


  10. Stephen, great seeing you in Boston. I see the post sparked some healthy debates. Not sure why people seem to be “picking sides” on whether their Ford is better than the other guy’s Chevy, but at the end of the day it’s about empowering users of all platforms to be able to work more effectively with the system. The reality is admins and developers work on a platform for a variety of reasons – not one tool in the toolbox. Once they’re on the platform let’s get them tools they can use. None of these tools are “new” – this stuff’s been around for a while. The “new” part is they’re packaged better, increase granularity, easier to use, and available on more platforms than z/OS for instance. With SystemTap, the Linux admins and developers will have new powertools to help them in their work – and that’s a good thing.

  11. As much as I might like to ignore it, I take issue with this statement:

    None of these tools are [sic] new — this stuff’s been around for a while.

    While some of the ideas have been kicking around for a while, plenty of DTrace is new, as outlined in length in the Related Work section of our paper at USENIX ’04. Yes, we’re familiar with the offerings on z/OS like GTF and OmegaMon and yes, DTrace advances the state of the art beyond them — significantly beyond them in some dimensions. So contrary to popular belief, everything wasn’t actually done by z/OS or MULTICS in the 1960s…

  12. Higher level DTrace GUI tools are starting to appear, one such example is Chime. Granted a little basic just now but just like DTrace itself extensible. Yes the real value will be in combining DTrace and NetBeans – particuarly for Java programmers.

Leave a Reply

Your email address will not be published. Required fields are marked *