Why Log Data Management Is a Thing

With all the buzz around Observability over the last few years it’s easy to imagine that when it comes to logs, metrics and traces, it’s game over. Just stick all the data you need in a database, or these days a data lakehouse, and start building queries and dashboards. Easy.

Glibness aside, Observability tools vendors generally claim they can manage all your metrics and telemetry data in a single coherent store, with consistent access mechanisms. But as telemetry data has exploded so have costs. This is especially true when we want to correlate the data in terms of business needs- the problem with things like customer number, user or product ID, or IP address is that they are inherently high cardinality. Columns with many unique values drive up costs of memory and compute, and these costs get passed on to customers.

Where it used to be that folks complained Splunk was expensive, these days we hear the same about Datadog. Datadog, long seen as the darling of the APM space, rather than a “legacy player” is now seen as expensive. In 2025 this is a Datadog weakness – this issue comes up repeatedly in customer conversations. It’s not that people don’t value Datadog – they rave about the user experience. But costs are a concern.

There is a huge opportunity here around cost management, notably in the emerging log data management (LDM) space. Organisations are concerned with costs of storage, and cardinality.

So what is Log Data Management and why is it useful?

The bottom line is that log management is indeed a data management problem. Data sources continue to fragment, with every new platform the organisation uses. Modern Observability is not about instrumentation but data, especially in the open standards world of Open Telemetry. But we’re not yet living in a world where every piece of your infrastructure is using OTel. There is plenty of telemetry in different systems that needs to be integrated, collated and transformed before it’s truly useful- similarly to ETL in the data warehousing space.

So we’re faced with at least two factors that need to be addressed – cost of storage, and complexity of the data landscape.

You might justifiably claim if your tool is primarily used for troubleshooting by developers that you don’t actually need to store all the unique events in your log stream, but a lot of organisations with a strong focus on security and compliance, such as those in regulated industries, do indeed want to store all the telemetry.

With log data management you’re concentrating on integrating data from a range of sources, often leaving it in place, but with pipeline routing and data refinery capabilities to allow you to manage all of your log data cost as cost effectively as possible.

The company that arguably best represents this view of the market right now is Cribl. It doesn’t position itself as a replacement for Observability or even log management vendors, but rather as an adjunct to them.

Cribl Stream, formerly called LogStream, is about sending data to multiple places, rather than collecting it into one. It can enrich or redact data, and is used by customers to reduce volume for existing ingest platforms. Storing every AWS CloudTrail or Windows XML event gets very expensive quickly.

Cribl is taking a similar approach with Cribl Search. Rather than consolidating data in one place before searching, the platform will search at the edge, so that you can search across multiple contexts, with data left in place, using a unified query language. Federated search is really hard, but the philosophy of “search-in-place” is a great play for customers that don’t want to buy another centralisation promise.

Chronosphere moved from using CrowdStrike as an OEM log provider to launching their own log control product in June 2025. Chronosphere’s platform is all about letting users control the volume of data they are storing, but because they don’t price on ingest users can have more visibility into what they need to keep. When Chronopshere launched their log product, CEO Martin Mao told RedMonk:

One of the big gaps is you don’t know which and which sections of your logs to reduce and how to reduce them. And the way we solve that, and this is one of the reasons why we built our own back end, is we actually have to analyze all of how you use the logs in terms of dashboards, alerts and things like that. Then we turn the analytics into suggestions to feed a telemetry pipeline, and you can reduce your data volume.

Startup Control Theory was created to give companies more operational and cost control over their logs. Co-founder Bob Quillen argues that OTel helped democratize instrumentation and collection of telemetry, but it still led to “fat dumb pipes that dump into a data lake, and then you pay for ingest of that data, indexing it, and retaining it. And we thought, ‘there’s got to be a better way to do this.’” And so he and his co-founders set out to create a control layer to sit on top of telemetry data to better manage it.

Hydrolix, on the other hand, is tackling the cost problem of logs by delivering an extremely high compression data lake that can stay always hot. Hydrolix’s approach is maybe tangential to the other log data management approaches mentioned here. While other competitors focus on reducing the total volume of logs saved and stored, Hydrolix instead have focused on reigning in costs by building their own proprietary compression methodology. A big reason why they can compress so efficiently is that their solution focuses exclusively on logs, not any other type of telemetry.

Honeycomb, which positions itself as the best solution for querying and analysing high cardinality data at scale, has responded to the need for pipeline-based log data management approaches. It launched the Honeycomb Telemetry Pipeline product. Telemetry Pipeline Manager uses the OpenTelemetry Collector to scrape and collect system logs. The collector supports multiple log formats. Logs can also be refined, the elimination of redundant data. The Refiner also enables the identification of potentially important events – showing for example, errors or slow requests. The rest of the data is archived, but you can rehydrate full-fidelity logs and traces directly from S3.

Datadog also offers an Observability Pipelines product – customers save on egress costs by sending only valuable logs to a chosen observability vendor, and then routing other logs to long-term storage such as AWS S3, Azure Blob Storage or Google Cloud. As ever Datadog offers plenty of out of the box functionality, in this case more than 150 predefined parsing rules. to transform logs into structured formats for querying using its Grok parser.

Other products to look at include Mezmo (telemetry pipelines and log analysis) – it actively markets itself round, for example, reducing datadog spend. Edge Delta also plays in the (telemetry pipeline space)[https://edgedelta.com/comparison/edge-delta-vs-cribl/]

So telemetry pipelines is definitely part of the solution, but we feel an active and intentional approach to log data management, with a specific focus on, in effect, hierarchical storage, where data be stored in cheaper blog storage, but also quickly rehydrated and made available for querying. The key point here is that log storage is expensive. Every enterprise or SaaS company we talk to feels that pain. Which is where log data management comes in.

Disclosure: Splunk, Cribl, Chronosphere, Control Theory, Honeycomb, AWS, Microsoft (Azure), and Google Cloud are RedMonk clients.

James Governor's Monkchips

Why Log Data Management Is a Thing

No Comments

Leave a Reply Cancel reply

Recent Posts

Subscribe to Blog via Email

Recent Posts

Recent Comments

Archives

Categories

Meta