We started the ClickHouse to Trino migration in January with a clear hypothesis: our analytics workloads had outgrown what a single ClickHouse cluster could do, and Trino’s decoupled compute-and-storage model would give us the elasticity we needed.

Three months later, the honest answer is: it’s complicated.

Why we started

Our main ClickHouse cluster was running at about 70% capacity during peak analytical queries. The team wanted to scale compute independently of storage — something ClickHouse’s architecture doesn’t support cleanly without significant operational overhead.

Trino promised exactly that. Separate compute pool, scale to zero when idle, connect to the same underlying storage we already had (S3-compatible object storage + Iceberg tables).

Benchmark one: cold scan on historical data

ClickHouse won. By a significant margin. On a full scan of our largest EVM dataset (about 4.2TB in compressed column format), ClickHouse returned results in 6-8 seconds consistently. Trino, even with well-optimized Iceberg tables and a moderately large cluster, was returning results in 18-25 seconds.

This was expected. ClickHouse is purpose-built for this workload.

Benchmark two: concurrent analyst queries

Trino won. With 12 simultaneous analytical queries from different team members, ClickHouse started degrading around query 8 — memory pressure, merge conflicts, slower responses. Trino handled all 12 with predictable latency because each query got its own worker resources.

Where we landed

We’re not replacing ClickHouse. We’re adding Trino as a second query layer for specific use cases:

  • Interactive analyst queries that need to run concurrently
  • Joins across heterogeneous data sources (ClickHouse + S3 Iceberg + PostgreSQL)
  • Long-running analytical workloads that we don’t want competing with production ingestion

ClickHouse stays as the primary real-time engine for ingestion and sub-second queries.

It’s not the clean migration story I hoped to write. It’s messier and more expensive and more correct.