Vendée
03 Inference economics
07  /  24
The takeaway

Latency, not parameter count, is what enterprises actually buy.

Median first-token latency
184ms
47% faster YoY

Measured across 2.4 billion production requests on the Vendée-Large endpoint between Jan and Dec 2024, p50, 512-token prompts, European POPs only.

First-token latency by model · ms, lower is better
Vendée-Large Reference set
n = 2,438M requests · p50 · 512-tok prompt Measured Jan – Dec 2024
Source: Vendée Platform Telemetry, internal benchmark v4.2, December 2024.
Annual platform review · 2024 07 / 24