Worker Onboarding

How a brand-new worker becomes a fully-fused host: the install snippet, every env var, why two master addresses exist, the bootstrap handshake, the fuse handshake, and what survives a container recreate.

The install snippet

The Hosts page in the master UI generates this snippet for every new host. It is the only thing you ever paste on a worker — every field is auto-filled by the master at the moment you create the host record.

docker run -d \
  --name contextbay-worker \
  --restart unless-stopped \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v contextbay-worker-data:/var/lib/contextbay \
  -e CONTEXTBAY_MASTER_URL=http://<MASTER_LAN_IP>:7480 \
  -e CONTEXTBAY_MASTER_ADDR=<MASTER_MESH_IP>:7481 \
  -e CONTEXTBAY_NODE_NAME=<worker-1> \
  -e CONTEXTBAY_ENROLL_TOKEN=<one-time-token-from-ui> \
  -e CONTEXTBAY_SHARED_SECRET=<shared-secret-from-ui> \
  ghcr.io/contextbay/contextbay-worker:latest

<MASTER_MESH_IP> is the master's address on the Headscale mesh, in the 100.64.0.0/10 CGNAT range — the master is always assigned 100.64.0.1, so the shape is the same on every install.

What every env var means

VariableUsed forNotes
CONTEXTBAY_ENROLL_TOKENFirst-call onlyOne-shot, ~24h TTL, consumed by the master on first call. Re-deploying a worker with the same (already-consumed) token returns already_consumed; use Regenerate token in the UI to mint a fresh one.
CONTEXTBAY_SHARED_SECRETCluster-wideSet on the master at first deploy ([mesh].shared_secret) and identical on every worker. Constant-time compared before the master even touches the database.
CONTEXTBAY_MASTER_URLFirst-call onlyLAN HTTP URL (e.g. http://<MASTER_LAN_IP>:7480). Used only for the first /api/enroll call — a fresh worker can't be on the mesh until it has enrolled.
CONTEXTBAY_MASTER_ADDRSteady stateMesh gRPC address (typically <MASTER_MESH_IP>:7481). Used for all steady-state communication after enrollment. Encrypted end-to-end via Headscale/WireGuard.
CONTEXTBAY_NODE_NAMEIdentityThe host record id. Must match the host you registered in the UI — the master pins gRPC streams by name and rejects mismatches.
CONTEXTBAY_TSNET_STATE_DIRPersistenceOptional. Defaults to /var/lib/contextbay/tsnet. Where the worker's mesh identity (private key, authkey state) is persisted. A non-empty state dir means the worker is already enrolled — see Persistence + container recreate below.

Every variable also accepts a sibling _FILE form (Docker secret idiom). For example, set CONTEXTBAY_SHARED_SECRET_FILE=/run/secrets/cb_shared_secret and CB will read the file contents instead of the inline value.

Why two master addresses?

A brand-new worker can't join the encrypted mesh until it has an authkey, and it can't get an authkey without calling the master's API. Catch-22 — solved by two distinct addresses with strictly different lifetimes:

  • CONTEXTBAY_MASTER_URL (LAN HTTP) — used once for POST /api/enroll. After this call returns, this URL is never used again. It does not need to be reachable in steady state.
  • CONTEXTBAY_MASTER_ADDR (mesh gRPC) — used for everything after enrollment. Heartbeats, metrics, events, command stream — all flow over WireGuard. This is the address you actually depend on long-term.

You can drop the LAN URL from your firewall rules entirely once all workers are enrolled — only the mesh-side gRPC port matters.

The bootstrap call

On startup the worker reads the env, decides whether enrollment is needed (token present + state dir empty), and runs the handshake:

Worker (cmd/worker)                                   Master (internal/api)
  |                                                       |
  | read CONTEXTBAY_* env                                 |
  | state dir empty + token present? → enroll             |
  |                                                       |
  | POST CONTEXTBAY_MASTER_URL/api/enroll  ────────────→  |
  |   { token, shared_secret }                            |
  |                                                       |
  |                          shared-secret check (CT cmp) |
  |                          GetEnrollmentByToken         |
  |                          mint Headscale pre-auth key  |
  |                          ConsumeEnrollment (atomic)   |
  |                          flip node → "monitoring"     |
  |                                                       |
  | ←──────────────────  200 OK { headscale_authkey,      |
  |                              control_url,            |
  |                              master_mesh_addr,       |
  |                              node_id }               |
  |                                                       |
  | tsnet.Up(authkey, control_url) → join mesh            |
  | persist state to /var/lib/contextbay/tsnet/           |
  |                                                       |
  | gRPC dial CONTEXTBAY_MASTER_ADDR  ─────────────────→  |
  |   stream Heartbeat / ReportMetrics / ReportEvents     |

From the moment the gRPC stream opens, the LAN URL is irrelevant. The Headscale control-plane URL returned by the master can be overridden with [mesh].headscale_public_url on the master — useful when CB sits behind a reverse proxy whose Host header doesn't match the worker-facing URL.

Enrollment rejection reasons

Every /api/enroll rejection is tagged with a reason code (also the reason label on the contextbay_enroll_rejections_total Prometheus counter). The host detail page shows the most recent attempts:

ReasonHTTPWhat it means
rate_limited429More than 10 enroll attempts/min from this IP. Backs off automatically.
bad_request400Missing/invalid JSON body, missing token or shared_secret.
bad_secret401CONTEXTBAY_SHARED_SECRET on the worker doesn't match [mesh].shared_secret on the master.
invalid_token401Token does not exist (typo, regenerated, or for a different cluster).
expired410Token TTL elapsed — click Regenerate token in the UI.
already_consumed410Token has already been used. Re-enrollment is intentionally one-shot.
race_consumed410Concurrent /api/enroll calls — one won, this one lost the race.
not_configured503Master has no shared_secret configured, or Headscale client is missing.
mesh_mint_failed502Headscale rejected the pre-auth key request — check cb-headscale logs.

Fuse: from monitoring to fused

Enrollment alone gives you monitoring: heartbeats, metrics, Docker events, and a small command stream for log/exec proxying. To do container CRUD on the worker — start, stop, restart, redeploy stacks — you fuse the host.

Fuse is a deliberate second step because:

  • It rolls out a Portainer Edge agent on the worker, which is a larger trust boundary than monitoring-only.
  • It registers a new Portainer endpoint id with the master — CB then routes all container actions through Portainer for that endpoint.
  • It must be admin-initiated (no auto-fuse) so you have a chance to inspect the host first.

Click Fuse on the host card or call POST /api/hosts/{id}/fuse. The fuse watchdog gives up after a deadline; failures are tagged with a source label on contextbay_fuse_failed_total (portainer_error or watchdog_timeout).

What you can do at each state

StateAvailable actions
pendingToken minted, waiting for first /api/enroll. View host detail, regenerate token, delete host.
monitoringMesh joined, heartbeats flowing. Read metrics, view events, stream logs, exec into containers (read-only on the host itself). No container CRUD.
fusingEdge agent rolling out. Watchdog deadline running.
fusedFull container CRUD via Portainer. Deploy stacks, scale, redeploy, all through the Portainer endpoint pinned on the host record.
degradedNo heartbeat for [server].stale_threshold_secs. Last-known state shown; actions still queue.

Persistence + container recreate

The worker container persists its mesh identity to /var/lib/contextbay (volume-mounted). If you recreate the container with the same volume:

  • The worker does not re-enroll — the token has been consumed and would be rejected anyway.
  • It detects that the tsnet state dir is non-empty and skips the bootstrap call entirely.
  • tsnet brings up the existing identity, dials CONTEXTBAY_MASTER_ADDR, and resumes heartbeats.

If you delete the volume (or omit the volume mount on a new container) you must regenerate the token in the UI and run a fresh install snippet — the old token is gone.

Refreshing a worker

To deploy new worker code without re-enrolling, use make deploy-worker from the master's repo:

make deploy-worker WORKER=<worker-1> SSH_USER=ubuntu

# Optional override when both modes are present:
# MODE_OVERRIDE=systemd       # force systemd-managed binary
# MODE_OVERRIDE=docker-image  # force docker-image worker

The script:

  • Builds the worker binary locally with the master's build flags (so versions stay aligned).
  • scp's the binary to /opt/contextbay-worker/ on the remote host.
  • Auto-detects whether the worker is managed by systemd or run as a docker-image, then restarts the appropriate unit/container. If both are present and you didn't pass MODE_OVERRIDE, the script refuses to act so you can't accidentally restart the wrong one.

Workers do not pull from a registry — the master scp's binaries directly. Keeps multi-arch builds explicit.

Common failures

Token already consumed

You re-pasted the install snippet on a host where it had already run, or two starts raced. Click Regenerate token on the host card and rerun the snippet.

Bad shared secret

The worker's CONTEXTBAY_SHARED_SECRET doesn't match the master's [mesh].shared_secret. Re-copy the snippet — it always renders the current secret.

Port 9100 already in use

The worker's /metrics endpoint listens on 9100 by default — same as the standalone node-exporter most hosts run. Either stop node-exporter (CB already includes its data via cAdvisor-style worker metrics) or set CONTEXTBAY_METRICS_PORT to something else.

CONTEXTBAY_MASTER_ADDR unreachable

Enrollment succeeded but heartbeats fail. The mesh joined but the worker can't reach the master's mesh IP. Check cb-headscale logs (docker logs cb-headscale) and the master UI's Settings → Mesh page — both sides should appear under /api/mesh/nodes.

See also: Troubleshooting for general issues, and the Observability page for enrollment metrics + dashboards.