tecosystems

Google and the Case for Telemetry: What Vendors Could Learn

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

Tim O’Reilly has called his assertion that “Data is the Next Intel Inside” the “least-understood principle from my original Web 2.0 manifesto.” And while I don’t love the metaphor – given that Intel’s profits accrued primarily externally – I’m inclined to agree.

When smart people are conflating my arguments in favor of telemetry with Red Hat’s current business model, the only supportable conclusion is that the thread has been lost. For which I accept the blame. So let me try once more to explain what I believe today’s software vendors can learn from Google.

To do so, let’s examine the recent debate over the quote unquote open source business model; while open source is certainly not uniquely capable of generating, collecting and leveraging telemetry, it does enjoy certain advantages in the scale of its distribution which become important. The basic argument – which is not new, please note – distills to this assertion: open source is not, in and of itself, a business model. Support, service and break/fix can take you only so far as a business.

That might seem controversial, but only if you’re not in the business of open source. I doubt if anyone I know from that world would take exception to the claim; many would violently agree with it. Open source is a method of developing and distributing software; one with certain advantages and disadvantages. That’s about it. It’s not the golden goose, nor will it turn dross into gold.

As Sara points out, then, if you’re a commercial organization, it is imperative that the open source development method be complemented by a commercial business model. At least if you plan on continuing as a commercial organization.

There are dozens, maybe hundreds, of mechanisms that businesses use to profit off of open source software today. As thoroughly as open source permeates the world of IT today, and given what we know of where most open source contributions come from (at least in the context of enterprise class projects), it would be foolish to argue that no one has figured out a mechanism for profiting off of open source. Google and its market cap – yes, even at its present valuation – would put the lie to your claims, built as it is upon a foundation of open source software.

The better question, I’ve long believed, is not how open source companies can make money, then, but how they can make even more money. The kind of profits their proprietary forbearers generated. And the answer to that question, in my view, has always been telemetry.

Consider the case of Google. While their valuation – whether it’s rising or falling – depends almost entirely on the performance of their advertising business, such an analysis omits what may ultimately be their single most valuable asset: the largest store of user generated telemetry this side of ECHELON. This dataset allows Google to offer features such as “Did You Know,” which is merely a pattern recognition algorithm applied to immense query volumes. If a certain percentage of millions of queries are immediately followed by a second query, it might be reasonable to conclude that a user might have mistyped the first; a simple but efficient usage of the incoming telemetry of user search queries.

More sophisticated is the analysis that Google does on their dataset in an effort to predict flu outbreaks. Here’s how they describe it:

Each week, millions of users around the world search for online health information. As you might expect, there are more flu-related searches during flu season, more allergy-related searches during allergy season, and more sunburn-related searches during the summer. You can explore all of these phenomena using Google Trends. But can search query trends provide an accurate, reliable model of real-world phenomena?

We have found a close relationship between how many people search for flu-related topics and how many people actually have flu symptoms. Of course, not every person who searches for “flu” is actually sick, but a pattern emerges when all the flu-related search queries from each state and region are added together. We compared our query counts with data from a surveillance system managed by the U.S. Centers for Disease Control and Prevention (CDC) and found that some search queries tend to be popular exactly when flu season is happening. By counting how often we see these search queries, we can estimate how much flu is circulating in various regions of the United States.

In other words, they watch the incoming telemetry for patterns, compare it other data, and extract useful intelligence, which is presented in an interesting application.

How is this possible? In a word, volume. They generate – one would assume, as the market leader in search – the largest potential dataset to operate from, which gives their pattern recognition algorithms a statistically significant dataset to work on. Which brings us back to open source. Open source infrastructure may have its issues, particularly around the monetization of same, as discussed above, but volume typically isn’t one of them. When even Gartner, conservative a technology predictor as there is, allows that open source software is pervasive, it’s safe to conclude that adoption is substantial.

Thus, the question: can open source software leverage its ubiquity (read: volume) using telemetry for financial return? I don’t see why not.

Now that last statement is actually false: there are several obvious reasons why not. I’m sure I caught some of you furiously objecting in the comments space just now, on the grounds that enterprises will not risk disclosing any of their telemetry for reasons that involve competitive advantage, compliance, performance, privacy, security and so on. But none of these, in my view, are insurmountable, provided that enterprise software vendors can build trust and – most importantly – return value. Serious, differentiating value.

This idea, like the open source model discussions above, is not new: Akismet, WordPress’ anti-spam facility, works along these lines. As do a variety of anti-virus solutions. Splunk’s Splunkbase is an open, collaborative effort built on the same principles. And desktop applications have had the ability for years to phone home bug and crash reports.

But the idea has yet to be actively embraced and extended within the context of core infrastructure software. It’s as if the software vendors have learned nothing from Google, and prefer eking out increasingly marginal existences on increasingly pressured margins.

What if MySQL customers, for example, could opt-in to phone home certain anonymized data about their implementations, in return for the ability to answer questions like:

  • How is my uptime compared to the average MySQL customer?
  • How is my uptime compared to other MySQL customers in my vertical?
  • How does my performance compare to similarly sized deployments?
  • What are my worst performing queries, and what do they have in common with other poorly performing queries?

And so on. As a MySQL user, I know I’d sign up for that, and if the price was right, pay for it. Which would be 100% more revenue than MySQL generates from us at the present time.

Now some of you are doubtless saying: “that would never happen in an enterprise.” To which I’d reply: maybe.

But maybe not. I can remember all too well the days when enterprises considered the SaaS model anathema; something to not only be avoided, but hated and feared. Fast forward a few years, and SaaS is all the rage.

So is it that implausible that we’ll see, gradually, a recognition that telemetry collected, aggregated and analyzed by a trusted supplier can be returned to the customer as a feature, just like Google’s “Did you mean?” I don’t think so.

And frankly, given the potential revenue opportunity, were I a vendor I’d be falling all over myself trying to find out. Data might not be the Intel Inside, as Tim asserts, but it’s immensely valuable yet massively underutilized.

Disclosure: MySQL(Sun) and Splunk are RedMonk customers, while Google is not.