// docs / architecture
Architecture
What runs where, what talks to what, and why each piece exists.
The shape
Three long-running processes: the manager (Fastify + TypeScript), EMQX (shared MQTT broker), and Postgres. Plus one Node-RED container per instance, plus one nginx container in front to terminate TLS and route by subdomain.
Everything except Postgres runs on the openflow-network Docker bridge, so
containers reach each other by name. Nginx proxies to nodered-<subdomain>:1880;
instance containers reach the manager via host.docker.internal:4071 for the
magic-token validation callback; EMQX calls the manager at the same address for auth.
Request flow
From the moment a user hits wfengine.example.com:
# 1. nginx (in container, port 443) server_name ~^(?<subdomain>[^.]+)\.example\.com$; proxy_pass http://nodered-$subdomain:1880; # 2. Docker DNS resolves nodered-wfengine to the right container # (network aliases let one container answer to multiple names) # 3. Node-RED settings.js sees Authorization: Bearer <MAGIC> # or a `openflow_token` cookie, and authenticates the request # 4. The editor loads. WebSocket /comms upgrade goes through the # same middleware path; URL access_token is rewritten to MAGIC.
If nginx returns 404 (because Node-RED's httpNodeRoot is /api and
the caller forgot the prefix), the subdomain server's error_page 404 falls
through to a @api_fallback location that rewrites the URL with /api/
and re-proxies. External webhook callers can use either form.
Magic-token auth
The manager mints two tokens per instance, with two very different lifetimes:
- The magic token is per-instance, stored on the row, injected as
OPENFLOW_MAGIC_TOKENin the container's environment. Stable for the instance's lifetime. Resolved by the editor'stokens()callback against the env var, in memory, with no I/O. Survives restarts. - The access token is single-use, one-minute expiry, written to a hashed table in Postgres. Generated each time the user clicks Launch.
When the access token lands at the editor, the Openflow shim in settings.js
immediately promotes the cookie to the magic token (HttpOnly, 30 days). Browser
requests carry it; the shim also rewrites ?access_token= in the URL for WebSocket
upgrade. Result: a single click signs the user in, and the session survives instance
restarts without re-clicking Launch.
MQTT isolation
One shared EMQX broker, multi-tenant. Tenant boundaries are enforced by mountpoints, not ACLs.
On every CONNECT, EMQX calls the manager's /mqtt/auth route with the client's
username + password. The manager looks up the instance, returns
{ result: 'allow', client_attrs: { tenant: 'ff/<projectId>/' } }. EMQX
uses client_attrs.tenant as the mountpoint for that connection: every topic the
client publishes or subscribes to is automatically prefixed.
Two clients in the same project share the prefix and see each other's traffic. Clients in
different projects do not. The flow itself uses flat topic names (db/update/job),
and the broker handles the namespacing.
State and storage
The manager owns Postgres. The schema is straightforward:
users,projects,project_members: identity and access.instances: subdomain, template, magic token, MQTT creds, resource limits.instance_aliases: additional URL slugs an instance answers to.snapshots: serialized flow + credentials, for promotion between projects.backups: metadata for full/datatar.gz archives.audit_log: actor + action + target. Optional LLM-summarized prose alongside.
Each instance's /data directory is a Docker bind mount under
data/instances/<instanceId>/ on the host. Flows live there, palette modules
live there, Node-RED's own config files live there. Move that directory, move the instance.
Snapshots vs backups
Two separate concepts, kept separate on purpose.
| snapshot | backup | |
|---|---|---|
| contents | flows.json + encrypted creds | tar.gz of /data |
| size | kB | MB to GB |
| use | promote between instances or projects | disaster recovery |
| cadence | on demand | nightly cron |
| restore | stop, rewrite flows, restart | stop, untar, restart |
Snapshots are the unit promoted across environments. Backups are insurance. The dashboard's Backups tab shows both; the in-editor toolbar's Snapshot button captures a snapshot, and the dropdown next to it offers rollback.
operational note
Adding a subdomain alias requires the instance container to be recreated, because Docker
network aliases are pinned at create time. The UI surfaces restartNeeded: true
on the response and the toast prompts the operator. Without that restart, the alias resolves
to no upstream and nginx returns 502.
That is the whole architecture worth knowing on day one. Operational specifics
(single-file-mount nginx, certbot DNS-01 vs ACM, ALB target-group tuning) live in the
repository's docs/ directory and the README.