Writing
#ai
2 articles tagged ai.
TDD by Construction: How the Vimbus Loop Validates Its Own Code at Scale
A validation layer wrapped around an autonomous code-generation loop: static task validation before any agent runs, split browser verification and re-grounded hexagonal rules while it runs, and a codebase audit that marks nothing done without a traced, tested path.
testingtddautomationaiarchitecture
Evaluation-Driven Development: Shipping an AI Booking Agent You Can Trust
I was fixing Holocomm's booking agent one conversation at a time, with no metric to tell me whether a change helped or quietly regressed. The fix: evaluate the whole agentic flow with openevals — golden fixtures, multi-turn simulated users, an adversarial safety floor — and promote a change only when the numbers say it's better.
aiagentsevaluationtestingllm