Writing

#evaluation

1 article tagged evaluation.

All ai2 architecture2 testing2 agents1 ai-agents1 automation1 ddd1 evaluation1 llm1 monorepo1 multi-tenancy1 tdd1

May 21, 2026·7 min read

Evaluation-Driven Development: Shipping an AI Booking Agent You Can Trust

I was fixing Holocomm's booking agent one conversation at a time, with no metric to tell me whether a change helped or quietly regressed. The fix: evaluate the whole agentic flow with openevals — golden fixtures, multi-turn simulated users, an adversarial safety floor — and promote a change only when the numbers say it's better.

aiagentsevaluationtestingllm