// docs / troubleshooting
Troubleshooting
Real foot-guns, in one page, written in the form symptom → cause → fix. If you spend ten minutes on a problem and don't find it here, open a discussion on the repo so we can add it.
Webhook to /postback or /webhooks returns 404 "Cannot POST"
Symptom. An external caller (Retool, Salesforce, a partner service) POSTs to
https://<sub>.example.com/postback/success and gets back a 404 from
Node-RED with the body Cannot POST /postback/success.
Cause. Node-RED's httpNodeRoot is set to /api in
Openflow's settings.js, so HTTP In nodes mount under
/api/<path>, not /<path>.
Fix. Either point the caller at /api/postback/success, or rely on
the nginx fallback that's already in place: the subdomain server has a 404 error_page
that retries the request with the /api/ prefix, so the bare path works
too. If that fallback isn't in your nginx config (older installs), add it — or
have the caller use the prefixed form.
Adding a subdomain alias and immediately hitting it returns 502
Symptom. You add an alias old-name from the Danger Zone tab, the
dashboard says Alias added. You open https://old-name.example.com/
and nginx returns 502.
Cause. Docker network aliases are pinned at container-create time, so the alias
isn't attached to the running container. The dashboard's response includes
restartNeeded: true for exactly this reason, and the toast says so.
Fix. Restart the instance. The container is recreated with the alias attached. Hitting the alias URL after that works.
Node-RED shows an auth popup after an instance restart
Symptom. You restart an instance (or it restarts itself). The editor was open; a few seconds later the Node-RED login popup appears with no user interaction.
Cause. The editor's WebSocket reconnect carries a stale Node-RED-issued session token (NR's in-memory token cache is wiped on restart, so the token doesn't validate).
Fix. Already shipped in settings.js: the
httpAdminMiddleware rewrites the Authorization header AND the
?access_token= query parameter in the URL to the per-instance master token
on every request. WebSocket auth picks up the master token and authenticates cleanly.
If you see this on a custom build, make sure your settings.js has the URL
rewrite branch and not just the header rewrite.
Edits to nginx.conf don't take effect after nginx -s reload
Symptom. You change the host-side nginx.<root>.conf, send a
reload to the openflow-nginx container, and the running config doesn't
reflect your edit.
Cause. The conf is mounted into the container via a single-file bind mount.
sed -i (and any editor that writes by rename) creates a new file at the
same path. The container keeps reading the old inode.
Fix. Restart the container, don't reload:
$ docker restart openflow-nginx
AWS ALB reports the manager as unhealthy with HTTP 400
Symptom. Target group describe-target-health shows the target as
unhealthy with description Health checks failed with these codes:
[400]. The site still works externally, because all-targets-unhealthy is
fail-open.
Cause. The ALB is health-checking the target with HTTP on a TLS-only port. nginx replies with HTTP 400 "Plain HTTP request sent to HTTPS port".
Fix. Set the target group's HealthCheckProtocol=HTTPS and
HealthCheckPath=/healthz. Tune the thresholds while you're at it
(interval 10s, timeout 5s, healthy 2, unhealthy 3 is a reasonable default). Make sure
the subdomain nginx block has default_server so the SNI-less ALB probe
lands on the manager rather than falling through to the wildcard subdomain block.
MQTT Broker Traffic chart is flat with tens of MB of "traffic"
Symptom. Dashboard's broker chart shows Messages / min: 0 but
Traffic (1h): 47 MB. The line graph is pinned at zero.
Cause. Some MQTT client is hammering the broker with failing CONNECTs.
messages.received is zero (no PUBLISH packets) but
bytes.received ticks up from the CONNECT-CONNACK churn. Most often this
is a notifier or publisher with a wrong username/password.
Fix. Look at packets.connack.auth_error against
packets.connect.received on the EMQX metrics endpoint. If they match
1:1, you have an auth loop. Fix the credentials or kill the offending client.
Dashboard shows <sub>.openflow.ing instead of the real domain
Symptom. Your ROOT_DOMAIN is example.com, but instance
cards show integration.openflow.ing in the URL preview.
Cause. The client bundle was built without VITE_ROOT_DOMAIN baked
in, and the runtime fallback (current behavior: window.location.hostname)
isn't picking up the right value — usually because the SPA is being served from a
different domain than the apex.
Fix. Rebuild the client. Either set VITE_ROOT_DOMAIN explicitly
before npm run build, or make sure the SPA is served from the apex
domain. The runtime fallback only works when those match.
Drizzle says "permission denied" when running migrations on a non-primary box
Symptom. npm run db:push works on one box and fails on another with
a Postgres permission error.
Cause. The app role differs per box. If a previous deploy created tables by
running raw DDL as postgres (superuser), the tables ended up owned by
postgres, but the runtime app user on that box doesn't have rights.
Fix. Either always run schema changes through Drizzle (npm run db:push
connects as the app role), or wrap raw DDL with
ALTER TABLE … OWNER TO <app role>. Recoverable from SQL via
SELECT tableowner FROM pg_tables WHERE tablename = '…'.