Engineering

The Latency Tax Nobody Talks About

Why your users in São Paulo are subsidizing your San Francisco infrastructure — and what it costs you in retention.

Marcus Chen Mar 14, 2025 · 8 min read

Last November we traced a p99 latency spike across our API gateway back to a single region: us-east-1. The cause was not a code regression or a database bottleneck — it was geography. Roughly a third of our daily active sessions now originated from São Paulo, Jakarta, and Warsaw, and every request crossed an ocean before reaching our single-region deployment in Northern Virginia.

The Hidden Cost of Centralization

We were running two tiers of service without knowing it. Median latency for São Paulo users hit 340ms — nearly six times the 58ms Bay Area users experienced. Session duration in high-latency regions dropped 40%. Churn in those cohorts outpaced domestic retention by 2.3x over the trailing quarter. We had built a product that only worked well if you lived near it.

At 200ms p50, you start losing conversions. At 400ms, you have built a feature that only works for people who live near your data center.