AK
WorkExperienceCase StudiesTimelineWritingContact
Resume
AK

Writing

#evaluation

1 article tagged evaluation.

Allai2architecture2testing2agents1ai-agents1automation1ddd1evaluation1llm1monorepo1multi-tenancy1tdd1

May 21, 2026·7 min read

Evaluation-Driven Development: Shipping an AI Booking Agent You Can Trust

I was fixing Holocomm's booking agent one conversation at a time, with no metric to tell me whether a change helped or quietly regressed. The fix: evaluate the whole agentic flow with openevals — golden fixtures, multi-turn simulated users, an adversarial safety floor — and promote a change only when the numbers say it's better.

aiagentsevaluationtestingllm