Forge — LLM Batch & Realtime Orchestration
Two cooperating Node daemons coordinate LLM batch and realtime jobs through Postgres only. Most AI apps stop at calling an API; Forge handles the operational layer — batching, provider routing, leases, retries, crash recovery, quota recovery, validation, notifications, and observability.
What we were solving
Context & problem
LLM workloads need both batch and realtime execution, provider flexibility, retry safety, cost awareness, validation, and operational visibility. A queue-based architecture adds infrastructure and coordination complexity.
Forge explores a Postgres-centered architecture where workers coordinate through SQL state transitions, leases, and transactional outboxes — no queue, no IPC, no shared worker processes.
How we approached it
Solution
Anvil (the builder) claims eligible pending rows and shapes them into provider-compatible batches. Hammer (the executor) advances those batches through worker loops that submit, poll, collect, execute realtime jobs, recover stale leases, archive cooled batches, stamp updates, recover quota, and send notifications.
OpenAI and Anthropic ship through provider ports; the core execution model stays provider-agnostic. The two daemons share only Postgres — coordination lives entirely in SQL transitions, leases, and transactional outboxes.
Impact
Outcomes
- - Batch and realtime LLM execution paths in one architecture.
- - Worker coordination through Postgres without a separate queue service.
- - Lease expiry for crash recovery and stale worker reclamation.
- - OpenAI and Anthropic via provider ports.
- - Provider outputs revalidated against processor response schemas before success.
- - Operator-facing health, metrics, archive, credentials, costs, architecture surfaces.
- - Horizontal scaling through SKIP LOCKED claims and idempotent transitions.