After seeing one too many bad benchmarks presented to me in briefings, I tweeted a little while ago about doing them in an open-source way. Given our great community around RedMonk, it was no surprise to get a bit of debate on exactly what that meant. Justin Sheehy, CTO of Basho, responded that benchmarks should not only be open source but also reproducible, which made me think about doing them not just open source but scientifically. Here’s how it all started:
RT @dberkholz: All benchmarks used by a company to market its products as superior should be open source. Discuss.
— Justin Sheehy (@justinsheehy) May 7, 2012
Marketing folks love to show benchmarks that support their software being really fast, or at least faster than the competition. The only problem is, most marketers don’t recall anything about their last science class beyond how gross it was to dissect a frog.
Funny thing is, the “scientific method” as most people learn it is a joke in real science. The only reason most scientists need a hypothesis is to justify getting money. In reality, we don’t hypothesize much about specific results (that tends to bias research toward desired results) — instead we ask questions about “what happens if?” “What happens if this changes?” “What happens if that disappears?”
Science is truly about curiosity and asking questions, then looking at the responses, rigorously. Finally, the results, once published, need to be reproducible by anyone. That’s why open source and open access have gained such popularity in the scientific community, because when combined, they mean everyone can get at the results of a study as well as the code and data necessary to reproduce it.
So what does this mean in the context of an open benchmark, and why should anyone care? First of all, benchmarks are often perceived as marketing fluff precisely because of the lack of information about how and why those specific choices were made, in addition to the difficulty of reproducing the result. The basic idea is that people like data and graphs, which is true. But anyone reading deeper (as developers tend to do, vs high-level folks) start to call BS on this stuff as soon as they ask, “How can I run this on my own system?”
An open benchmark looks like this:
- The code is freely available and open-sourced, so people can understand and improve upon the benchmark suite
- The rationale for specific benchmarks is provided
- The conditions under which a benchmark was run are fully specified, so people can reproduce it
- The licenses of the software and the benchmark do not require permission to publish results
- The data produced by a benchmark are available for anyone to analyze, so they can come up with the same numbers or perform different analyses upon the same data
If you aren’t doing this, why not? If you believe your software is really the best, stand behind it. As an added bonus, you can actually build a community around your benchmark suite and turn it into the industry standard. Developers believe solidly backed numbers and data, and they’re the ones driving most decision-making in IT today, so you’d better cater to them or pay the price.