I spent two weeks last January rewriting our language server protocol. Not because it was broken — it worked fine — but because I realized the latency between my intent and the editor's response was the bottleneck nobody talks about. Three hundred and forty milliseconds from thought to first suggestion. Longer than a blink.
The latency problem nobody measures
Most teams benchmark against completion accuracy — how often does the AI guess right? That metric matters, but it misses the point. When you're deep in a flow state, your brain generates code faster than your fingers can move. The editor should be predicting three steps ahead, not reacting to the last keystroke.
async function predict(ctx) {
const ast = await parse(ctx.doc);
return model.infer(ast, ctx.cursor);
}