For roughly the last ten years, a meaningful percentage of my working hours have been spent thinking about observability. If you're not familiar with the term, "observability" is what we call it now that "monitoring" doesn't sound expensive enough. The actual work is unglamorous in that you collect a lot of logs, some metrics, a few traces, and then you give them to people.

I generally like my job. I like that we're always trying new ideas and approaches. I like the fact that when things go wrong, the answer is almost always sitting there in the data, waiting to be found by whoever is patient enough to look. But I want to be honest with you: in ten years of doing this work, across a half-dozen companies and every observability platform you've heard of and a few you probably haven't, logs have never stopped being the worst part of the job. They were the worst part when I started. They are the worst part today. I fully expect them to be the worst part of this job forever until the robots rise up and rip my head off in one clean sweep.

I've written about why logs are terrible before, so I'll spare you the full lecture and give you the short version.

Every developer's expectations for logs are set by a single formative experience: the syslog box. Or a container running locally. Or tail -f on a production server they probably shouldn't have SSH'd into. The point is that at some early, tender moment in their career, they had an experience with logs that was flawless. They ran grep and something useful came back. They piped it into jq and got exactly what they needed.

This experience is the observability equivalent of a first kiss. It ruins them for everything that comes after.

Because here is the thing about that flawless experience: it works because the system is small, the volume is trivial, and the person querying is the same person who wrote the log line. There is no schema drift, no cardinality explosion, no cross-team consumer with dashboard expectations, no VP asking why the "revenue events" graph has a gap in it.

Then there are forty services. Now there are four hundred. Now the logs are being consumed not just by developers but by customer service, who need to look up a specific user's failed checkout from Tuesday. And by the data team, who are quietly building a business-critical dashboard on top of a log line that a backend engineer is about to refactor without telling anyone. And by the on-call, who at 3 AM does not want to learn a new query language, does not want to think about index patterns, and would like the search bar to just work.

So you have a technical problem — the volume is enormous, the shape is inconsistent, the queries are unpredictable — sitting on top of an expectations problem, which is worse. Developers want logs instantly, they want to run arbitrary operations on them, and they will not commit to a schema. Meanwhile the less-technical consumers of that same data want the dashboards to be stable forever, the UI to be forgiving, and the whole thing to feel like a normal product. These two audiences are, in most practical respects, at war with each other, and you are the diplomat.

Clickhouse

ClickHouse came out of Yandex, where it was built to chew through analytical queries against absurd volumes of clickstream data. It was not designed for observability. It just happens to be shockingly good at it, because clickstream data and observability data have a lot in common: high volume, append-heavy, time-ordered, mostly read in aggregate, and every so often you need to reach in and find one specific needle.

You can run it yourself with Helm charts. You can point Grafana at it via the ClickHouse plugin, or use their own web UI, or bring your own frontend. Their docs are actually good, which I mention because it's rare enough to be worth flagging. I've never used their ClickStack setup though, so YMMV.

For observability specifically, the OpenTelemetry Collector has a ClickHouse exporter, which means you can pipe OTLP data straight in and let it manage the initial schema for you. ClickHouse is designed to scan billions of rows and ingest an amount of data that, when you first see the numbers, makes you assume they're lying. They're not lying. You query it with SQL, which is a language that already exists and was not created by a startup two weeks ago.

But why Clickhouse specifically for logs?

I'm ranting about logs and then I'm explaining why I like to administer Clickhouse more. Let me take a second and explain why Clickhouse is really good at logs at scale.

Logs, as a data shape, have some peculiar properties. They're append-only. You never update a log line, and you almost never delete a single one, though you delete a lot of them at once when retention kicks in. They arrive roughly in time order, though never actually in order. They're read in bursts where nobody looks at logs for days, and then during an incident somebody wants to scan a billion of them in seconds. They're highly compressible, because most of the bytes in your logs are repeated: the same service names, the same hostnames, the same error strings, the same JSON keys, over and over and over again. And critically, when you query them, you almost always want either a narrow time range across all fields or an aggregation across a wide time range with a few filters. You very rarely want "give me one specific row by ID" the way you would from a transactional database. (There are exceptions when its something like GDPR or compliance logging which is its own subgenre of nightmares).

In a row-oriented database — Elasticsearch, Postgres, MySQL — the data for a single log line is stored together on disk. If your log has 40 fields and your query only cares about 3 of them, tough luck, you're reading all 40 from disk anyway. The database will filter it in memory, but the disk I/O has already happened.

ClickHouse stores each column separately. If your query says SELECT service, status_code, count() FROM logs WHERE timestamp > now() - INTERVAL 1 HOUR GROUP BY service, status_code, ClickHouse reads exactly three columns off disk: timestamp, service, and status_code. The other 37 columns in your schema might as well not exist. On observability data, where you often have dozens of attributes but any given query touches three or four, this is the difference between scanning 800GB and scanning 40GB.

This is also why the compression numbers look absurd. Columnar data compresses far better than row-oriented data because the values within a single column are, by nature, similar to each other. A column of service_name values might have a hundred distinct strings across a billion rows. ZSTD eats that for breakfast. You'll routinely see 10–14x compression ratios on real observability data, compared to 2–3x for Elasticsearch.

That's not the amazing part though

The amazing thing is that ClickHouse scales without changing shape.

I don't know how else to say this. Every other observability backend I've worked with mutates as it grows. The architecture at 1 TB a day and the architecture at 10 TB a day are recognizably different systems, with different failure modes, different ops burdens, and different mental models. ClickHouse at 10 TB a day looks like ClickHouse at 1 TB a day with more shards. That's it. That's the pitch. That's the whole reason I'm writing this.

Let me show you what I mean.

1 TB a Day

At 1 TB a day, every modern observability stack is roughly okay. If you're at this scale, you can pick almost anything and be productive. The differences below are real but they're not yet painful.

Elasticsearch

A relatively vanilla Elasticsearch cluster with Logstash providing some buffer between ingest and the Lucene indexes. Users get full-text search, which is genuinely good — this is the thing Elasticsearch is actually best at, and at this scale it delivers. Mapping explosions are already a background risk with mixed data, so dynamic mapping needs to be disabled or carefully templated from day one. ILM policies (hot → warm → delete) are non-optional even at this size, because forgetting to set them is how you get paged on a Saturday about disk pressure. Roughly $6–9K/month.

LGTM

Nothing too crazy. Alloy (formerly Grafana Agent, RIP) unifies the collection story into a single daemon, which is nice. Loki works well as long as you spend some time educating developers on how to attach useful labels — a conversation you will have many times, with many people, for the rest of your career. Mimir and Tempo largely do what it says on the tin. Roughly $3.5–5K/month.