Worker Onboarding

How a brand-new worker becomes a fully-fused host: the install snippet, every env var, why two master addresses exist, the bootstrap handshake, the fuse handshake, and what survives a container recreate.

The install snippet

The Hosts page in the master UI generates this snippet for every new host. It is the only thing you ever paste on a worker — every field is auto-filled by the master at the moment you create the host record.

docker run -d \
  --name contextbay-worker \
  --restart unless-stopped \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v contextbay-worker-data:/var/lib/contextbay \
  -e CONTEXTBAY_MASTER_URL=http://<MASTER_LAN_IP>:7480 \
  -e CONTEXTBAY_MASTER_ADDR=<MASTER_MESH_IP>:7481 \
  -e CONTEXTBAY_NODE_NAME=<worker-1> \
  -e CONTEXTBAY_ENROLL_TOKEN=<one-time-token-from-ui> \
  -e CONTEXTBAY_SHARED_SECRET=<shared-secret-from-ui> \
  ghcr.io/contextbay/contextbay-worker:latest

<MASTER_MESH_IP> is the master's address on the Headscale mesh, in the 100.64.0.0/10 CGNAT range — the master is always assigned 100.64.0.1, so the shape is the same on every install.

What every env var means

Variable	Used for	Notes
`CONTEXTBAY_ENROLL_TOKEN`	First-call only	One-shot, ~24h TTL, consumed by the master on first call. Re-deploying a worker with the same (already-consumed) token returns `already_consumed`; use Regenerate token in the UI to mint a fresh one.
`CONTEXTBAY_SHARED_SECRET`	Cluster-wide	Set on the master at first deploy (`[mesh].shared_secret`) and identical on every worker. Constant-time compared before the master even touches the database.
`CONTEXTBAY_MASTER_URL`	First-call only	LAN HTTP URL (e.g. `http://<MASTER_LAN_IP>:7480`). Used only for the first `/api/enroll` call — a fresh worker can't be on the mesh until it has enrolled.
`CONTEXTBAY_MASTER_ADDR`	Steady state	Mesh gRPC address (typically `<MASTER_MESH_IP>:7481`). Used for all steady-state communication after enrollment. Encrypted end-to-end via Headscale/WireGuard.
`CONTEXTBAY_NODE_NAME`	Identity	The host record id. Must match the host you registered in the UI — the master pins gRPC streams by name and rejects mismatches.
`CONTEXTBAY_TSNET_STATE_DIR`	Persistence	Optional. Defaults to `/var/lib/contextbay/tsnet`. Where the worker's mesh identity (private key, authkey state) is persisted. A non-empty state dir means the worker is already enrolled — see Persistence + container recreate below.

Every variable also accepts a sibling _FILE form (Docker secret idiom). For example, set CONTEXTBAY_SHARED_SECRET_FILE=/run/secrets/cb_shared_secret and CB will read the file contents instead of the inline value.

Why two master addresses?

A brand-new worker can't join the encrypted mesh until it has an authkey, and it can't get an authkey without calling the master's API. Catch-22 — solved by two distinct addresses with strictly different lifetimes:

CONTEXTBAY_MASTER_URL (LAN HTTP) — used once for POST /api/enroll. After this call returns, this URL is never used again. It does not need to be reachable in steady state.
CONTEXTBAY_MASTER_ADDR (mesh gRPC) — used for everything after enrollment. Heartbeats, metrics, events, command stream — all flow over WireGuard. This is the address you actually depend on long-term.

You can drop the LAN URL from your firewall rules entirely once all workers are enrolled — only the mesh-side gRPC port matters.

The bootstrap call

On startup the worker reads the env, decides whether enrollment is needed (token present + state dir empty), and runs the handshake:

Worker (cmd/worker)                                   Master (internal/api)
  |                                                       |
  | read CONTEXTBAY_* env                                 |
  | state dir empty + token present? → enroll             |
  |                                                       |
  | POST CONTEXTBAY_MASTER_URL/api/enroll  ────────────→  |
  |   { token, shared_secret }                            |
  |                                                       |
  |                          shared-secret check (CT cmp) |
  |                          GetEnrollmentByToken         |
  |                          mint Headscale pre-auth key  |
  |                          ConsumeEnrollment (atomic)   |
  |                          flip node → "monitoring"     |
  |                                                       |
  | ←──────────────────  200 OK { headscale_authkey,      |
  |                              control_url,            |
  |                              master_mesh_addr,       |
  |                              node_id }               |
  |                                                       |
  | tsnet.Up(authkey, control_url) → join mesh            |
  | persist state to /var/lib/contextbay/tsnet/           |
  |                                                       |
  | gRPC dial CONTEXTBAY_MASTER_ADDR  ─────────────────→  |
  |   stream Heartbeat / ReportMetrics / ReportEvents     |

From the moment the gRPC stream opens, the LAN URL is irrelevant. The Headscale control-plane URL returned by the master can be overridden with [mesh].headscale_public_url on the master — useful when CB sits behind a reverse proxy whose Host header doesn't match the worker-facing URL.

Enrollment rejection reasons

Every /api/enroll rejection is tagged with a reason code (also the reason label on the contextbay_enroll_rejections_total Prometheus counter). The host detail page shows the most recent attempts:

Reason	HTTP	What it means
`rate_limited`	429	More than 10 enroll attempts/min from this IP. Backs off automatically.
`bad_request`	400	Missing/invalid JSON body, missing token or shared_secret.
`bad_secret`	401	CONTEXTBAY_SHARED_SECRET on the worker doesn't match [mesh].shared_secret on the master.
`invalid_token`	401	Token does not exist (typo, regenerated, or for a different cluster).
`expired`	410	Token TTL elapsed — click Regenerate token in the UI.
`already_consumed`	410	Token has already been used. Re-enrollment is intentionally one-shot.
`race_consumed`	410	Concurrent /api/enroll calls — one won, this one lost the race.
`not_configured`	503	Master has no shared_secret configured, or Headscale client is missing.
`mesh_mint_failed`	502	Headscale rejected the pre-auth key request — check cb-headscale logs.

Fuse: from monitoring to fused

Enrollment alone gives you monitoring: heartbeats, metrics, Docker events, and a small command stream for log/exec proxying. To do container CRUD on the worker — start, stop, restart, redeploy stacks — you fuse the host.

Fuse is a deliberate second step because:

It rolls out a Portainer Edge agent on the worker, which is a larger trust boundary than monitoring-only.
It registers a new Portainer endpoint id with the master — CB then routes all container actions through Portainer for that endpoint.
It must be admin-initiated (no auto-fuse) so you have a chance to inspect the host first.

Click Fuse on the host card or call POST /api/hosts/{id}/fuse. The fuse watchdog gives up after a deadline; failures are tagged with a source label on contextbay_fuse_failed_total (portainer_error or watchdog_timeout).

What you can do at each state

State	Available actions
`pending`	Token minted, waiting for first /api/enroll. View host detail, regenerate token, delete host.
`monitoring`	Mesh joined, heartbeats flowing. Read metrics, view events, stream logs, exec into containers (read-only on the host itself). No container CRUD.
`fusing`	Edge agent rolling out. Watchdog deadline running.
`fused`	Full container CRUD via Portainer. Deploy stacks, scale, redeploy, all through the Portainer endpoint pinned on the host record.
`degraded`	No heartbeat for [server].stale_threshold_secs. Last-known state shown; actions still queue.

Persistence + container recreate

The worker container persists its mesh identity to /var/lib/contextbay (volume-mounted). If you recreate the container with the same volume:

The worker does not re-enroll — the token has been consumed and would be rejected anyway.
It detects that the tsnet state dir is non-empty and skips the bootstrap call entirely.
tsnet brings up the existing identity, dials CONTEXTBAY_MASTER_ADDR, and resumes heartbeats.

If you delete the volume (or omit the volume mount on a new container) you must regenerate the token in the UI and run a fresh install snippet — the old token is gone.

Refreshing a worker

To deploy new worker code without re-enrolling, use make deploy-worker from the master's repo:

make deploy-worker WORKER=<worker-1> SSH_USER=ubuntu

# Optional override when both modes are present:
# MODE_OVERRIDE=systemd       # force systemd-managed binary
# MODE_OVERRIDE=docker-image  # force docker-image worker

The script:

Builds the worker binary locally with the master's build flags (so versions stay aligned).
scp's the binary to /opt/contextbay-worker/ on the remote host.
Auto-detects whether the worker is managed by systemd or run as a docker-image, then restarts the appropriate unit/container. If both are present and you didn't pass MODE_OVERRIDE, the script refuses to act so you can't accidentally restart the wrong one.

Workers do not pull from a registry — the master scp's binaries directly. Keeps multi-arch builds explicit.

Common failures

Token already consumed

You re-pasted the install snippet on a host where it had already run, or two starts raced. Click Regenerate token on the host card and rerun the snippet.

Bad shared secret

The worker's CONTEXTBAY_SHARED_SECRET doesn't match the master's [mesh].shared_secret. Re-copy the snippet — it always renders the current secret.

Port 9100 already in use

The worker's /metrics endpoint listens on 9100 by default — same as the standalone node-exporter most hosts run. Either stop node-exporter (CB already includes its data via cAdvisor-style worker metrics) or set CONTEXTBAY_METRICS_PORT to something else.

CONTEXTBAY_MASTER_ADDR unreachable

Enrollment succeeded but heartbeats fail. The mesh joined but the worker can't reach the master's mesh IP. Check cb-headscale logs (docker logs cb-headscale) and the master UI's Settings → Mesh page — both sides should appear under /api/mesh/nodes.

See also: Troubleshooting for general issues, and the Observability page for enrollment metrics + dashboards.