James Governor's Monkchips

Measure what you manage–data, storage and admin

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

Image result for punch cards

It’s a business book cliché (sometimes wrongly attributed to Drucker, sometimes wrongly to Deming) that you can’t manage what you can’t measure, but in IT it’s important to look at whether and how to measure what you do manage – increasingly data.

We use terms such as Total Cost of Ownership, estimating the total cost of acquiring, deploying and operating a product. Servers per admin is a common metric, which varies a great deal between organisations and environments. With the rise of managed services the numbers are falling precipitously.

Last week I talked to Snehal Antani ‎SVP Business Analytics & IoT at Splunk. He does a great job of presenting himself as the voice of the customer – having worked at IBM, but latterly GE Capital where he was indeed one of Splunk’s biggest customers. Antani is good on both practice and theory – as a management framework rather he talks about the physics of data in terms of heat energy rather than gravity. New data is hot and kinetic, and cools over time, and should be managed in terms of entropy.

Splunk is of course all about time series data. Time is Splunk’s primary key for the machine data it collects.

And as a customer, he said:

95% of my searches were on data over the last 24 hours, 4% over the last 4 weeks, with only 1% over 7 year timeframes.

This is the hierarchy of log data under management, according to cost/value, because storage immediacy has a cost. Traditionally we had “hierarchical storage management”, policy-based automation to store data on high or low cost media according to how quickly we needed to be able to access it. Today at first glance memory is so cheap, storage ditto, networks so fast – behold the in-memory database – that we almost don’t need HSM style thinking. But of course, we’re generating and using far more data than ever. Jevon’s Paradox states that increased efficiency sees a corresponding increase in resource utilisation. The cloud doesn’t do away with the need for decision-making about data storage – note Amazon Web Services offers cold storage in the shape of Glacier.

A major digital business is going to generate billions if not trillions of events per day. If it’s a traditional enterprise in a regulated environment its going to need to store these events for at least 2 years. Organisations need to decide which bucket to drop their data in according to type.

One metric Antani used in managed data is Gigabyte/admin, like server/admin. This felt kind of funny to be honest. I remember back in the old days when an average SAP application server instance was about 50Gb, but who even thinks in Gb any more? Shades of Dr Evil. But Antani’s point stands. He sees 3 key metrics as a framework for decision-making

  • Efficiency – Gb to admin
  • Collection to insight – how quickly is data used. This is a DoD and intelligence community metric.
  • Cost vs value – expenditure running the platform versus the value extracted from it.

I am impressed by the last one. Assessing value extracted feels like a pipedream for many organisations.

Splunk is about machine data and has established a competence in assessing value. One of the areas it focuses on in customer engagements is maximising value through correlation. What problem is the customer trying to solve, what data sources are missing to solve the problem, and how might the organisation onboard them. Data source assessments feels like a fruitful engagement model to engage with companies that understand data transformation is the new digital transformation. Also – how long is a particular data set valuable for, before it can be offloaded to colder storage. I might need to rethink this old post on data value increasing over time.

The conversation with Antani left me with plenty to think about – I have a follow up to write about customer journeys as log data. But I would love to know what you think – what are the right metrics for data under management. Could management costs be a proxy for data value, or is that absurd in the age of database as a service? How do we measure how we manage this supposedly valuable resource.

No Comments

Leave a Reply

Your email address will not be published. Required fields are marked *