Every model undergoes six months of adversarial evaluation before any public release.
We fund interpretability research that may not return results this decade — and that is the point.
Reasoning traces are visible to every user, not gated behind an enterprise API tier.
We have declined launch requests when evaluations show unresolved risk — and we would do it again.