Engineering

Why We Replaced Our Entire Cache Layer in a Weekend

A story about latency budgets, distributed clusters, and the surprising cost of over-engineering your hot path.

Maya Chen · Jan 15, 2025 · 8 min read

I spent two weeks last January staring at monitoring dashboards, watching our P99 latency climb from 12ms to 340ms every afternoon between 2 and 5 PM. The culprit was not our application code — it was the caching layer we had built eighteen months earlier, a baroque tangle of TTL policies and invalidation hooks that had grown beyond anyone’s ability to reason about.

The Architecture We Inherited

The original design made sense when we had three services and a single Postgres instance. But by the time we had grown to twenty-two microservices, each with its own invalidation strategy, the cache had become the system’s nervous system — and it was having seizures. Write-through, write-behind, and read-through patterns were mixed freely, sometimes within the same service boundary.

A cache that nobody fully understands is worse than no cache at all. At least without one, your bugs are honest.

Simplifying the Hot Path

We made the decision over a Friday lunch to audit every caching pattern in the codebase. By Saturday evening we had a migration plan. By Sunday night we had deployed a simpler architecture — fewer patterns, clearer ownership, and a single source of truth for TTL configuration managed through a cache-policy.yaml file committed alongside each service.

The key insight was treating cache invalidation not as a distributed systems problem but as a configuration problem. Instead of ad-hoc DEL calls scattered across handler code, we declared intent: “this query is invalidated when this entity mutates.” A shared library translated those declarations into the appropriate key patterns and expiry logic.