Latency, not parameter count, is what enterprises actually buy.

Median first-token latency

184ms

47% faster YoY

Measured across 2.4 billion production requests on the Vendée-Large endpoint between Jan and Dec 2024, p50, 512-token prompts, European POPs only.

First-token latency by model · ms, lower is better

Vendée-Large Reference set

n = 2,438M requests · p50 · 512-tok prompt Measured Jan – Dec 2024

Source: Vendée Platform Telemetry, internal benchmark v4.2, December 2024.

Annual platform review · 2024 07 / 24

This is the Mistral AI Orange design system, applied by Curio Design — a design-style library for AI agents. Full Mistral AI Orange guide → designbycurio.com/learn/mistral-orange-ai-2024