Compute overview

EM
GPU utilization
87.4%
+4.2 pts · 24h
Tokens / second
1.84M
+12.4% wow
P99 latency
214ms
−18ms 24h
Active fleet
3,206GPU
42 draining

Inference endpoints

Live · updated 7s ago
All regions Production Canary
Endpoint
Method
P99 latency
Throughput
Status
 
/v1/llm/generate
POST
186 ms
412k tok/s
Healthy
/v1/vision/embed
POST
94 ms
238k req/s
Healthy
/v1/diffusion/render
POST
612 ms
18.4k img/s
Degraded
/v1/speech/transcribe
POST
228 ms
96.2k req/s
Healthy
/v1/agents/orchestrate
GET
142 ms
54.8k req/s
Draining
/v1/index/retrieve
GET
38 ms
1.21M req/s
Healthy