Why ClickHouse keeps surprising me, three years in

I’ve been working with ClickHouse since the EVM indexer days, and the thing I keep coming back to is how stubbornly fast it is when you let it do what it wants.

Materialized views are the unsung hero — when paired with sensible partitioning, they turn what would be ETL into a column-store reflex. You stop thinking in pipelines and start thinking in projections.

Where it still bites

This post isn’t an introduction. It’s a list of the places ClickHouse has surprised me, and the places it still bites.

Joins are a trap. ClickHouse is not a transactional database. When you try to write the kind of joins you’d write in PostgreSQL, you’ll find queries that should be milliseconds suddenly taking minutes. The solution is almost always to pre-aggregate or use a Materialized View to denormalize.

Mutations are slow by design. UPDATE and DELETE in ClickHouse are asynchronous, heavy operations. They rewrite entire data parts on disk. We learned this the hard way when a schema migration on a 2TB table locked the team out of the warehouse for 45 minutes.

ReplacingMergeTree is not eventually consistent the way you think. Deduplication happens at merge time, not at insert time. Queries against a table with pending merges can return duplicates. Always wrap with FINAL or use argMax tricks.

What still impresses me

The query performance on cold data is absurd. We ran a full historical scan of 20TB of EVM data — events, transactions, traces — in under 8 seconds with the right sort key. PostgreSQL would have been warming up.

The UDF ecosystem has gotten genuinely good. We run Python UDFs for ABI decoding and GPU-accelerated NLP models directly inside ClickHouse queries now. It feels like cheating.

Three years in, I’m still not bored.