Worker Onboarding
How a brand-new worker becomes a fully-fused host: the install snippet, every env var, why two master addresses exist, the bootstrap handshake, the fuse handshake, and what survives a container recreate.
The install snippet
The Hosts page in the master UI generates this snippet for every new host. It is the only thing you ever paste on a worker — every field is auto-filled by the master at the moment you create the host record.
docker run -d \
--name contextbay-worker \
--restart unless-stopped \
-v /var/run/docker.sock:/var/run/docker.sock \
-v contextbay-worker-data:/var/lib/contextbay \
-e CONTEXTBAY_MASTER_URL=http://<MASTER_LAN_IP>:7480 \
-e CONTEXTBAY_MASTER_ADDR=<MASTER_MESH_IP>:7481 \
-e CONTEXTBAY_NODE_NAME=<worker-1> \
-e CONTEXTBAY_ENROLL_TOKEN=<one-time-token-from-ui> \
-e CONTEXTBAY_SHARED_SECRET=<shared-secret-from-ui> \
ghcr.io/contextbay/contextbay-worker:latest<MASTER_MESH_IP> is the master's address on the Headscale mesh, in the 100.64.0.0/10 CGNAT range — the master is always assigned 100.64.0.1, so the shape is the same on every install.
What every env var means
| Variable | Used for | Notes |
|---|---|---|
CONTEXTBAY_ENROLL_TOKEN | First-call only | One-shot, ~24h TTL, consumed by the master on first call. Re-deploying a worker with the same (already-consumed) token returns already_consumed; use Regenerate token in the UI to mint a fresh one. |
CONTEXTBAY_SHARED_SECRET | Cluster-wide | Set on the master at first deploy ([mesh].shared_secret) and identical on every worker. Constant-time compared before the master even touches the database. |
CONTEXTBAY_MASTER_URL | First-call only | LAN HTTP URL (e.g. http://<MASTER_LAN_IP>:7480). Used only for the first /api/enroll call — a fresh worker can't be on the mesh until it has enrolled. |
CONTEXTBAY_MASTER_ADDR | Steady state | Mesh gRPC address (typically <MASTER_MESH_IP>:7481). Used for all steady-state communication after enrollment. Encrypted end-to-end via Headscale/WireGuard. |
CONTEXTBAY_NODE_NAME | Identity | The host record id. Must match the host you registered in the UI — the master pins gRPC streams by name and rejects mismatches. |
CONTEXTBAY_TSNET_STATE_DIR | Persistence | Optional. Defaults to /var/lib/contextbay/tsnet. Where the worker's mesh identity (private key, authkey state) is persisted. A non-empty state dir means the worker is already enrolled — see Persistence + container recreate below. |
Every variable also accepts a sibling _FILE form (Docker secret idiom). For example, set CONTEXTBAY_SHARED_SECRET_FILE=/run/secrets/cb_shared_secret and CB will read the file contents instead of the inline value.
Why two master addresses?
A brand-new worker can't join the encrypted mesh until it has an authkey, and it can't get an authkey without calling the master's API. Catch-22 — solved by two distinct addresses with strictly different lifetimes:
- CONTEXTBAY_MASTER_URL (LAN HTTP) — used once for
POST /api/enroll. After this call returns, this URL is never used again. It does not need to be reachable in steady state. - CONTEXTBAY_MASTER_ADDR (mesh gRPC) — used for everything after enrollment. Heartbeats, metrics, events, command stream — all flow over WireGuard. This is the address you actually depend on long-term.
You can drop the LAN URL from your firewall rules entirely once all workers are enrolled — only the mesh-side gRPC port matters.
The bootstrap call
On startup the worker reads the env, decides whether enrollment is needed (token present + state dir empty), and runs the handshake:
Worker (cmd/worker) Master (internal/api)
| |
| read CONTEXTBAY_* env |
| state dir empty + token present? → enroll |
| |
| POST CONTEXTBAY_MASTER_URL/api/enroll ────────────→ |
| { token, shared_secret } |
| |
| shared-secret check (CT cmp) |
| GetEnrollmentByToken |
| mint Headscale pre-auth key |
| ConsumeEnrollment (atomic) |
| flip node → "monitoring" |
| |
| ←────────────────── 200 OK { headscale_authkey, |
| control_url, |
| master_mesh_addr, |
| node_id } |
| |
| tsnet.Up(authkey, control_url) → join mesh |
| persist state to /var/lib/contextbay/tsnet/ |
| |
| gRPC dial CONTEXTBAY_MASTER_ADDR ─────────────────→ |
| stream Heartbeat / ReportMetrics / ReportEvents |From the moment the gRPC stream opens, the LAN URL is irrelevant. The Headscale control-plane URL returned by the master can be overridden with [mesh].headscale_public_url on the master — useful when CB sits behind a reverse proxy whose Host header doesn't match the worker-facing URL.
Enrollment rejection reasons
Every /api/enroll rejection is tagged with a reason code (also the reason label on the contextbay_enroll_rejections_total Prometheus counter). The host detail page shows the most recent attempts:
| Reason | HTTP | What it means |
|---|---|---|
rate_limited | 429 | More than 10 enroll attempts/min from this IP. Backs off automatically. |
bad_request | 400 | Missing/invalid JSON body, missing token or shared_secret. |
bad_secret | 401 | CONTEXTBAY_SHARED_SECRET on the worker doesn't match [mesh].shared_secret on the master. |
invalid_token | 401 | Token does not exist (typo, regenerated, or for a different cluster). |
expired | 410 | Token TTL elapsed — click Regenerate token in the UI. |
already_consumed | 410 | Token has already been used. Re-enrollment is intentionally one-shot. |
race_consumed | 410 | Concurrent /api/enroll calls — one won, this one lost the race. |
not_configured | 503 | Master has no shared_secret configured, or Headscale client is missing. |
mesh_mint_failed | 502 | Headscale rejected the pre-auth key request — check cb-headscale logs. |
Fuse: from monitoring to fused
Enrollment alone gives you monitoring: heartbeats, metrics, Docker events, and a small command stream for log/exec proxying. To do container CRUD on the worker — start, stop, restart, redeploy stacks — you fuse the host.
Fuse is a deliberate second step because:
- It rolls out a Portainer Edge agent on the worker, which is a larger trust boundary than monitoring-only.
- It registers a new Portainer endpoint id with the master — CB then routes all container actions through Portainer for that endpoint.
- It must be admin-initiated (no auto-fuse) so you have a chance to inspect the host first.
Click Fuse on the host card or call POST /api/hosts/{id}/fuse. The fuse watchdog gives up after a deadline; failures are tagged with a source label on contextbay_fuse_failed_total (portainer_error or watchdog_timeout).
What you can do at each state
| State | Available actions |
|---|---|
pending | Token minted, waiting for first /api/enroll. View host detail, regenerate token, delete host. |
monitoring | Mesh joined, heartbeats flowing. Read metrics, view events, stream logs, exec into containers (read-only on the host itself). No container CRUD. |
fusing | Edge agent rolling out. Watchdog deadline running. |
fused | Full container CRUD via Portainer. Deploy stacks, scale, redeploy, all through the Portainer endpoint pinned on the host record. |
degraded | No heartbeat for [server].stale_threshold_secs. Last-known state shown; actions still queue. |
Persistence + container recreate
The worker container persists its mesh identity to /var/lib/contextbay (volume-mounted). If you recreate the container with the same volume:
- The worker does not re-enroll — the token has been consumed and would be rejected anyway.
- It detects that the tsnet state dir is non-empty and skips the bootstrap call entirely.
- tsnet brings up the existing identity, dials
CONTEXTBAY_MASTER_ADDR, and resumes heartbeats.
If you delete the volume (or omit the volume mount on a new container) you must regenerate the token in the UI and run a fresh install snippet — the old token is gone.
Refreshing a worker
To deploy new worker code without re-enrolling, use make deploy-worker from the master's repo:
make deploy-worker WORKER=<worker-1> SSH_USER=ubuntu
# Optional override when both modes are present:
# MODE_OVERRIDE=systemd # force systemd-managed binary
# MODE_OVERRIDE=docker-image # force docker-image workerThe script:
- Builds the worker binary locally with the master's build flags (so versions stay aligned).
- scp's the binary to
/opt/contextbay-worker/on the remote host. - Auto-detects whether the worker is managed by systemd or run as a docker-image, then restarts the appropriate unit/container. If both are present and you didn't pass
MODE_OVERRIDE, the script refuses to act so you can't accidentally restart the wrong one.
Workers do not pull from a registry — the master scp's binaries directly. Keeps multi-arch builds explicit.
Common failures
Token already consumed
You re-pasted the install snippet on a host where it had already run, or two starts raced. Click Regenerate token on the host card and rerun the snippet.
Bad shared secret
The worker's CONTEXTBAY_SHARED_SECRET doesn't match the master's [mesh].shared_secret. Re-copy the snippet — it always renders the current secret.
Port 9100 already in use
The worker's /metrics endpoint listens on 9100 by default — same as the standalone node-exporter most hosts run. Either stop node-exporter (CB already includes its data via cAdvisor-style worker metrics) or set CONTEXTBAY_METRICS_PORT to something else.
CONTEXTBAY_MASTER_ADDR unreachable
Enrollment succeeded but heartbeats fail. The mesh joined but the worker can't reach the master's mesh IP. Check cb-headscale logs (docker logs cb-headscale) and the master UI's Settings → Mesh page — both sides should appear under /api/mesh/nodes.
See also: Troubleshooting for general issues, and the Observability page for enrollment metrics + dashboards.

