Writing
Essays on agent evaluation and the AI reliability stack.
- 2026-05-04
Runtime, not evals
The PocketOS frame, and what it tells you about where agent monitoring is headed.
A 9-second production-database deletion is not an alignment problem. It is a permissions problem. The market is moving from post-hoc evals to runtime enforcement.
- 2026-04-30
The conflicted incumbent
Why your model vendor cannot be your eval vendor.
Auditor independence is universal in finance, pharma, and security. Agent evaluation is the last category to relearn the lesson.