// docs / architecture

Architecture

What runs where, what talks to what, and why each piece exists.

The shape

Three long-running processes: the manager (Fastify + TypeScript), EMQX (shared MQTT broker), and Postgres. Plus one Node-RED container per instance, plus one nginx container in front to terminate TLS and route by subdomain.

Everything except Postgres runs on the openflow-network Docker bridge, so containers reach each other by name. Nginx proxies to nodered-<subdomain>:1880; instance containers reach the manager via host.docker.internal:4071 for the magic-token validation callback; EMQX calls the manager at the same address for auth.

Request flow

From the moment a user hits wfengine.example.com:

# 1. nginx (in container, port 443)
server_name ~^(?<subdomain>[^.]+)\.example\.com$;
proxy_pass http://nodered-$subdomain:1880;

# 2. Docker DNS resolves nodered-wfengine to the right container
#    (network aliases let one container answer to multiple names)

# 3. Node-RED settings.js sees Authorization: Bearer <MAGIC>
#    or a `openflow_token` cookie, and authenticates the request

# 4. The editor loads. WebSocket /comms upgrade goes through the
#    same middleware path; URL access_token is rewritten to MAGIC.

If nginx returns 404 (because Node-RED's httpNodeRoot is /api and the caller forgot the prefix), the subdomain server's error_page 404 falls through to a @api_fallback location that rewrites the URL with /api/ and re-proxies. External webhook callers can use either form.

Magic-token auth

The manager mints two tokens per instance, with two very different lifetimes:

The magic token is per-instance, stored on the row, injected as OPENFLOW_MAGIC_TOKEN in the container's environment. Stable for the instance's lifetime. Resolved by the editor's tokens() callback against the env var, in memory, with no I/O. Survives restarts.
The access token is single-use, one-minute expiry, written to a hashed table in Postgres. Generated each time the user clicks Launch.

When the access token lands at the editor, the Openflow shim in settings.js immediately promotes the cookie to the magic token (HttpOnly, 30 days). Browser requests carry it; the shim also rewrites ?access_token= in the URL for WebSocket upgrade. Result: a single click signs the user in, and the session survives instance restarts without re-clicking Launch.

MQTT isolation

One shared EMQX broker, multi-tenant. Tenant boundaries are enforced by mountpoints, not ACLs.

On every CONNECT, EMQX calls the manager's /mqtt/auth route with the client's username + password. The manager looks up the instance, returns { result: 'allow', client_attrs: { tenant: 'ff/<projectId>/' } }. EMQX uses client_attrs.tenant as the mountpoint for that connection: every topic the client publishes or subscribes to is automatically prefixed.

Two clients in the same project share the prefix and see each other's traffic. Clients in different projects do not. The flow itself uses flat topic names (db/update/job), and the broker handles the namespacing.

State and storage

The manager owns Postgres. The schema is straightforward:

users, projects, project_members: identity and access.
instances: subdomain, template, magic token, MQTT creds, resource limits.
instance_aliases: additional URL slugs an instance answers to.
snapshots: serialized flow + credentials, for promotion between projects.
backups: metadata for full /data tar.gz archives.
audit_log: actor + action + target. Optional LLM-summarized prose alongside.

Each instance's /data directory is a Docker bind mount under data/instances/<instanceId>/ on the host. Flows live there, palette modules live there, Node-RED's own config files live there. Move that directory, move the instance.

Snapshots vs backups

Two separate concepts, kept separate on purpose.

	snapshot	backup
contents	flows.json + encrypted creds	tar.gz of /data
size	kB	MB to GB
use	promote between instances or projects	disaster recovery
cadence	on demand	nightly cron
restore	stop, rewrite flows, restart	stop, untar, restart

Snapshots are the unit promoted across environments. Backups are insurance. The dashboard's Backups tab shows both; the in-editor toolbar's Snapshot button captures a snapshot, and the dropdown next to it offers rollback.

operational note Adding a subdomain alias requires the instance container to be recreated, because Docker network aliases are pinned at create time. The UI surfaces restartNeeded: true on the response and the toast prompts the operator. Without that restart, the alias resolves to no upstream and nginx returns 502.

That is the whole architecture worth knowing on day one. Operational specifics (single-file-mount nginx, certbot DNS-01 vs ACM, ALB target-group tuning) live in the repository's docs/ directory and the README.