AI InfrastructureLLM InfrastructureWorker Orchestration / Provider Integration

Forge — LLM Batch & Realtime Orchestration

Two cooperating Node daemons coordinate LLM batch and realtime jobs through Postgres only. Most AI apps stop at calling an API; Forge handles the operational layer — batching, provider routing, leases, retries, crash recovery, quota recovery, validation, notifications, and observability.

RoleSystem Architect / Full-Stack AI Infrastructure Developer

TeamSolo build for LGMar

TimelineOngoing project

IndustryLLM Infrastructure

At a glance

By the numbers

~40/s
jobs per replica: ~50%
lower token cost: 2
LLM providers: 9
worker loops: 0
queue services

What we were solving

Context & problem

LLM workloads need both batch and realtime execution, provider flexibility, retry safety, cost awareness, validation, and operational visibility. A queue-based architecture adds infrastructure and coordination complexity.

Forge explores a Postgres-centered architecture where workers coordinate through the database alone — no queue, no IPC, no shared worker processes.

AI InfrastructureLLM InfrastructureWorker Orchestration / Provider Integration

Forge — LLM Batch & Realtime Orchestration

RoleSystem Architect / Full-Stack AI Infrastructure Developer

TeamSolo build for LGMar

TimelineOngoing project

IndustryLLM Infrastructure

At a glance

By the numbers

~40/s
jobs per replica: ~50%
lower token cost: 2
LLM providers: 9
worker loops: 0
queue services

What we were solving

Context & problem

Forge explores a Postgres-centered architecture where workers coordinate through the database alone — no queue, no IPC, no shared worker processes.

How we approached it

Solution

Anvil (the builder) claims eligible pending rows and shapes them into provider-compatible batches. Hammer (the executor) advances those batches through worker loops that submit, poll, collect, execute realtime jobs, recover stale leases, archive cooled batches, stamp updates, recover quota, and send notifications.

OpenAI and Anthropic ship through provider ports; the core execution model stays provider-agnostic. The two daemons share only Postgres — all coordination lives in the database, with no separate queue or message bus.

Impact

Outcomes

- Batch and realtime LLM execution paths in one architecture.
- Worker coordination through Postgres without a separate queue service.
- Lease expiry for crash recovery and stale worker reclamation.
- OpenAI and Anthropic via provider ports.
- Provider outputs revalidated against processor response schemas before success.
- Operator-facing health, metrics, archive, credentials, costs, architecture surfaces.
- Horizontal scaling: every daemon and worker is safe to run as multiple copies.

Forge architecture: Telix seeds the jobs table, Anvil batches, Hammer executes through provider ports and reliable outboxes

ArchitectureScreenshot

Worker fleetScreenshot

Forge batch and realtime state machine transitions

State machineScreenshot

Forge Prompt Platform v2: processors, tags, schemas, rendering, validation, output policy

Prompt Platform v2Screenshot

Behind the scenes

Tech & delivery

Stack

Node.js
TypeScript
PostgreSQL
OpenAI Batch API
Anthropic Batch API
NestJS
Prometheus
React + Vite

Challenges

Coordinating two daemons and nine worker loops through Postgres alone, instead of running a separate queue service.
Keeping side effects (tag updates, notifications) consistent with each job's state change, even across crashes.
Validating provider outputs fail-closed against processor response schemas before accepting success.
Designing provider ports so OpenAI and Anthropic batch semantics stay behind one execution model.

How I worked

Lease-based crash recovery and idempotent claiming make every daemon and worker safe to scale horizontally.
Prometheus metrics, health endpoints, and a local operator viewer expose jobs, batches, costs, and live architecture.
A quota recovery control loop pauses exhausted provider keys and probes them back into service.

How it holds together

Technical highlights

Postgres-only coordination

No queue, IPC, or shared process. All coordination lives in the database.

Provider-agnostic batch execution

OpenAI and Anthropic ship in-tree through a BatchProviderPort registry.

Crash-safe workers

Stalled jobs are detected and re-run automatically; every worker is safe to run more than once.

Reliable side effects

Tag updates and notifications are recorded together with each job's state change, so nothing is lost on a crash.

Prompt Platform v2

Processor prompts, schemas, output policies, tags, validation, preview, promotion surfaces.

Fail-closed validation

Provider success lines revalidated against response schemas before acceptance.

Quota recovery loop

Provider keys pause on quota exhaustion and recover via no-spend probe logic.

Operator UI

Local viewer: jobs, batches, archive, processors, credentials, costs, brand, live architecture.