Architecture
How the reference runtime executes the model the rest of these docs describe. Useful for operating a Catchment with confidence, contributing, or building an alternative runtime.
Two tiers: Catchment and Ducks
The Catchment owns the decisions. It runs the full orchestration engine — every Pond and every Ripple, pull and push — because the model's signature behaviours (bottleneck-paced Waves, pre-armed Taps) emerge from Ripple-level pull, which can't be decided from Pond-level state alone. It holds triggers and windows, applies the freshness rules from Theory, and emits exactly one kind of instruction: begin a Pond Run at freshness F.
A Duck owns one Pond's execution. Each executing Pond gets a dedicated worker subprocess — its Duck. Given begin_run(F), the Duck pushes every Ripple in its Pond to F using a push-only copy of the same engine, runs the Ripple functions in a thread pool against the Pond's DuckDB registry, exports the Parquet snapshots, and reports each completion back. Ducks are spawned on a Pond's first run, kept warm while a standing trigger is active, and stopped when the Pond goes idle.
The split mirrors the conceptual model: the Catchment is pull's home (demand, coordination, history); the Duck is push's (do this work, to this freshness, to completion).
One engine, shared
The orchestration rules live in a pure engine package (duckstring.engine) with no FastAPI, no database, no HTTP — a state machine over freshness, demand, and time, directly implementing the pseudocode in Theory. The Catchment embeds the full engine; the Duck embeds its push-only subset. The engine is also a behaviour-for-behaviour port of the playground's TypeScript reference implementation, so the simulation you can poke at in a browser and the runtime executing your data are the same machine.
Transport: Ducks always dial back
All communication is Duck-initiated: a Duck holds a short poll on GET /api/duck/{name}/{major}/jobs for commands and POSTs progress to /api/duck/{name}/{major}/events (a Duck serves one major line of a Pond). The Catchment never needs to reach a Duck.
This buys two properties. First, location transparency: a Duck on the same machine and a Duck across a network run identical code — remote execution is just a different way of launching the process (DUCKSTRING_CATCHMENT_URL tells a Duck where to dial). Second, resilience: because the Duck doesn't depend on being reachable, it doesn't depend on the Catchment being up either. Events are idempotent on freshness, so replays after a gap are harmless.
Resilience
The design goal: no single process is precious.
- Catchment down, Duck running — the Duck finishes its in-flight runs from its own ledger and engine, buffers its events, and replays them when the Catchment returns.
- Duck dies mid-run — the Catchment's liveness check notices (process gone, or silent past 60 s) and fails the run through the ordinary fault-tolerance path — budgets, blocking, run history all apply. A stuck-but-alive Duck reports itself failed via its own watchdog.
- Catchment restarts — engine state (freshness, demand, triggers, windows, fault states, budgets) rebuilds from the database; interrupted Pond Runs are re-dispatched. A restarted Duck reconciles against its ledger and re-runs only the Ripples that hadn't completed.
- Run cadence under failure — there is no global scheduler state to corrupt; demand and freshness are the only coordination, and both are durable.
Storage
Everything lives under the Catchment root, in three layers with distinct owners:
| Path | Owner | Contents |
|---|---|---|
duck.db | Catchment | The system of record: deployed versions and topology, the live graph, freshness/demand/fault state, triggers, windows, budgets, and the canonical run history (one row per Ripple attempt, with errors and tracebacks). SQLite. |
ponds/{name}/{version}/ | Catchment | Each deployed version's source, exactly as uploaded — the immutable artifact. |
ponds/{name}/m{major}/registry.duckdb | Duck | The major line's live working database — every table its Ripples write. Private to the Pond. |
ponds/{name}/m{major}/data/*.parquet | Duck | The published snapshots, exported atomically (write-then-rename) after each successful run. The only thing Sinks and queries read. |
ponds/{name}/m{major}/pond.db | Duck | The Duck's run ledger — its operational record for crash recovery and event replay. The Catchment's history remains canonical. |
Runtime storage is per major line (m1/, m2/, …) — concurrent majors execute and publish independently. Identity in duck.db follows the versioning model: the Pond name, each immutable deployed version, and a selection pointer per (name, major) are separate records — which is what makes deploys atomic and majors concurrent. Paths in the database are root-relative, so the whole directory is relocatable and a backup of the root is a backup of the Catchment.
Flow control
There is none — deliberately. The Catchment never caps concurrent Pond Runs and never rate-limits; completions clock the cascade, as the demand model prescribes. Cross-Pond data flows only through the exported snapshots, so any number of runs can overlap without read contention; within a Pond, concurrent table writes queue with retry rather than fail. The result is a runtime whose throughput is set by the pipeline's actual bottleneck and nothing else.