// docs / troubleshooting

Troubleshooting

Real foot-guns, in one page, written in the form symptom → cause → fix. If you spend ten minutes on a problem and don't find it here, open a discussion on the repo so we can add it.

Webhook to /postback or /webhooks returns 404 "Cannot POST"

Symptom. An external caller (Retool, Salesforce, a partner service) POSTs to https://<sub>.example.com/postback/success and gets back a 404 from Node-RED with the body Cannot POST /postback/success.

Cause. Node-RED's httpNodeRoot is set to /api in Openflow's settings.js, so HTTP In nodes mount under /api/<path>, not /<path>.

Fix. Either point the caller at /api/postback/success, or rely on the nginx fallback that's already in place: the subdomain server has a 404 error_page that retries the request with the /api/ prefix, so the bare path works too. If that fallback isn't in your nginx config (older installs), add it — or have the caller use the prefixed form.

Adding a subdomain alias and immediately hitting it returns 502

Symptom. You add an alias old-name from the Danger Zone tab, the dashboard says Alias added. You open https://old-name.example.com/ and nginx returns 502.

Cause. Docker network aliases are pinned at container-create time, so the alias isn't attached to the running container. The dashboard's response includes restartNeeded: true for exactly this reason, and the toast says so.

Fix. Restart the instance. The container is recreated with the alias attached. Hitting the alias URL after that works.

Symptom. You restart an instance (or it restarts itself). The editor was open; a few seconds later the Node-RED login popup appears with no user interaction.

Cause. The editor's WebSocket reconnect carries a stale Node-RED-issued session token (NR's in-memory token cache is wiped on restart, so the token doesn't validate).

Fix. Already shipped in settings.js: the httpAdminMiddleware rewrites the Authorization header AND the ?access_token= query parameter in the URL to the per-instance master token on every request. WebSocket auth picks up the master token and authenticates cleanly. If you see this on a custom build, make sure your settings.js has the URL rewrite branch and not just the header rewrite.

Edits to nginx.conf don't take effect after `nginx -s reload`

Symptom. You change the host-side nginx.<root>.conf, send a reload to the openflow-nginx container, and the running config doesn't reflect your edit.

Cause. The conf is mounted into the container via a single-file bind mount. sed -i (and any editor that writes by rename) creates a new file at the same path. The container keeps reading the old inode.

Fix. Restart the container, don't reload:

$ docker restart openflow-nginx

AWS ALB reports the manager as unhealthy with HTTP 400

Symptom. Target group describe-target-health shows the target as unhealthy with description Health checks failed with these codes: [400]. The site still works externally, because all-targets-unhealthy is fail-open.

Cause. The ALB is health-checking the target with HTTP on a TLS-only port. nginx replies with HTTP 400 "Plain HTTP request sent to HTTPS port".

Fix. Set the target group's HealthCheckProtocol=HTTPS and HealthCheckPath=/healthz. Tune the thresholds while you're at it (interval 10s, timeout 5s, healthy 2, unhealthy 3 is a reasonable default). Make sure the subdomain nginx block has default_server so the SNI-less ALB probe lands on the manager rather than falling through to the wildcard subdomain block.

MQTT Broker Traffic chart is flat with tens of MB of "traffic"

Symptom. Dashboard's broker chart shows Messages / min: 0 but Traffic (1h): 47 MB. The line graph is pinned at zero.

Cause. Some MQTT client is hammering the broker with failing CONNECTs. messages.received is zero (no PUBLISH packets) but bytes.received ticks up from the CONNECT-CONNACK churn. Most often this is a notifier or publisher with a wrong username/password.

Fix. Look at packets.connack.auth_error against packets.connect.received on the EMQX metrics endpoint. If they match 1:1, you have an auth loop. Fix the credentials or kill the offending client.

Dashboard shows `<sub>.openflow.ing` instead of the real domain

Symptom. Your ROOT_DOMAIN is example.com, but instance cards show integration.openflow.ing in the URL preview.

Cause. The client bundle was built without VITE_ROOT_DOMAIN baked in, and the runtime fallback (current behavior: window.location.hostname) isn't picking up the right value — usually because the SPA is being served from a different domain than the apex.

Fix. Rebuild the client. Either set VITE_ROOT_DOMAIN explicitly before npm run build, or make sure the SPA is served from the apex domain. The runtime fallback only works when those match.

Drizzle says "permission denied" when running migrations on a non-primary box

Symptom. npm run db:push works on one box and fails on another with a Postgres permission error.

Cause. The app role differs per box. If a previous deploy created tables by running raw DDL as postgres (superuser), the tables ended up owned by postgres, but the runtime app user on that box doesn't have rights.

Fix. Either always run schema changes through Drizzle (npm run db:push connects as the app role), or wrap raw DDL with ALTER TABLE … OWNER TO <app role>. Recoverable from SQL via SELECT tableowner FROM pg_tables WHERE tablename = '…'.