James Governor's Monkchips

Data aggregation, Docker and Datadog

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit


In recent posts I have been looking at data transformation and culture, tooling and finding the right indicators to track rather than vanity metrics. At RedMonk we spend a lot of time trying to understand developer choices and practitioner-led adoption, which isn’t always easy. Surveying developers is sub-optimal because good developers spend their timing writing software rather than filling in surveys. Meanwhile tracking purchases never worked as as a proxy for open source software adoption. Telemetry on the other hand, in a networked environment, is a great way to track what’s actually going on.

In his word of sog newsletter (which is excellent by the way, it’s well worth subscribing too) this week Stephen pointed at some great data aggregation work from Datadog.

“Exhibit A is what Datadog is doing. If you’ve talked to me in my capacity as an analyst, we’ve almost certainly discussed data and its usage. The basic idea is simple: every business inevitably generates a large amount of telemetry data about what they’re using, how it’s used and so on. Typically, this data is treated as exhaust and ignored. In spite of this, it has massive untapped value, and more importantly doesn’t need to violate a company’s privacy. Consider Datadog’s list of the Top 10 most common technologies running in Docker: it gives absolutely nothing away about individual customers, but aggregates their information, analyzes it and turns it into something valuable for Datadog, its customers and the wider world. As someone who’s been advocating for this for better than a decade, it’s been a long time coming, but expect to see more of this as we move forward.”

Datadog is an application performance management vendor, here using its telemetry to shine a light on the Docker ecosystem. Many of the insights are fascinating. The sample size is indeed scientific – Datadog compiled usage data from a sample of 10,000 companies and 185 million containers.

As per the chart above The Most Widely Used Images Are NGINX, Redis, and Elasticsearch.

Another one that caught my eye, Containers Churn 9x Faster Than VMs – not a surprise, given how containers are used – we often use the word ephemeral to describe them, particularly in continuous deployment – but it’s great to see actual data back it up.

In conclusion, Datadog did an exceedingly good job of story-telling with this latest data drop. We can all learn from that. RedMonk has worked with a number of companies taking similar approaches and would like to do more. If you have solid networked telemetry assets about infrastructure usage we’d love to hear from you.

 

 

 

No Comments

Leave a Reply

Your email address will not be published. Required fields are marked *