Architecture

How the master, Portainer, and the sub-container fleet are layered; how worker nodes join over an encrypted mesh; the database schema; and the security model.

The Three-Circle Model

ContextBay is structured as three concentric layers. The master container is the only thing you start by hand. It boots Portainer, and Portainer is then used to deploy the rest of the fleet as regular Compose stacks.

+-----------------------------------------------------------+
|  OUTER  contextbay-master  (Go API + embedded Next.js UI) |
|         REST :7480 / gRPC :7481 / WebSocket / Prometheus  |
|         /metrics  -- SQLite (WAL) or PostgreSQL           |
+-----------------------------------------------------------+
              |
              | docker.sock  (the only direct Docker call)
              v
+-----------------------------------------------------------+
|  INNER  cb-portainer  -- the Docker control plane         |
|         Manages every other container on every endpoint   |
+-----------------------------------------------------------+
              |
              | Portainer API (stack deploys + Edge Agents)
              v
+-----------------------------------------------------------+
|  SUB-CONTAINER FLEET  (one Compose stack per service)     |
|    cb-headscale     mesh control plane (WireGuard)        |
|    cb-prometheus    metrics scrape + PromQL               |
|    cb-alertmanager  alert routing                         |
|    cb-grafana       dashboards (auto-provisioned ds)      |
|    cb-n8n           workflow automation                   |
|    cb-wazuh         SIEM, FIM, vuln detection             |
|    cb-loki          log aggregation                       |
|    cb-tempo         distributed traces                    |
|    cb-pyroscope     continuous profiling                  |
|    cb-ollama        local LLM (auto-pulls embed model)    |
+-----------------------------------------------------------+
              ^
              | Headscale mesh (<MESH_IP>:7481, encrypted)
              |
+-----------------------------------------------------------+
|  WORKER NODES  -- Sentinel agent (monitoring only)        |
|  Heartbeat + metrics + Docker events + census             |
|  Container CRUD performed by Portainer Edge Agent         |
+-----------------------------------------------------------+

The key invariant: the master container talks to the host Docker socket exactly once — to start Portainer. Everything else is a Portainer-managed stack, which means stacks can be edited, redeployed, or removed through the same UI you already use for your own apps.

Master Container

The master is a single Go binary that embeds the Next.js frontend as static files via go:embed. On startup it orchestrates:

  1. Load and validate configuration (TOML + env overrides)
  2. Open database (SQLite WAL by default, PostgreSQL optional)
  3. Connect itself to the contextbay-internal Docker network
  4. Start Portainer (only direct Docker call) and persist the bootstrap admin password / JWT
  5. Initialise enabled modules (monitoring, alerting, AI, security, …)
  6. Deploy the sub-container fleet via Portainer API (Headscale, Prometheus, Grafana, n8n, Wazuh, Loki, Tempo, Pyroscope, Ollama, Alertmanager)
  7. Auto-bootstrap n8n: create owner account, mint API key, persist encryption key, seed 36 workflows
  8. Auto-pull the Ollama embedding model used by RAG
  9. Start HTTP/WebSocket server, gRPC server (over the mesh), and background services
  10. Block until SIGINT/SIGTERM, then graceful shutdown

HTTP/WebSocket Server

The API uses Go's standard net/http.ServeMux with pattern-based routing (Go 1.22+). Key components include the RBAC scope middleware (admin / operator / viewer), a WebSocket hub for real-time events, an in-memory metrics cache, and a typed event bus that bridges every state change to n8n via webhook.

gRPC Server

Two protobuf services handle worker communication. The gRPC listener is bound to the Headscale mesh interface, so workers only reach it after they have joined the mesh.

ServiceRPCs
NodeServiceRegister, Heartbeat, ReportMetrics (streaming), ReportEvents (streaming)
CommandServiceCommandStream (bidirectional) — used for log/exec session proxying. Container CRUD goes through Portainer, not gRPC.

Background Services

ServicePurposeInterval
Portainer Edge PollerSync container state from every Portainer endpoint30 seconds
Sub-container Health MonitorProbe each cb-* stack and surface stateContinuous
Alertmanager ReceiverConvert incoming alerts into events + Grafana annotationsPer webhook
Backup SchedulerRun cron-scheduled volume backupsPer cron schedule
AI Tiered RouterClassify with Ollama, escalate to Claude when neededPer request
AI Session ManagerDequeue and execute Claude tasks (max 2 concurrent)Continuous
Stale Node DetectorMark nodes as degraded / unreachable after missed heartbeatsPeriodic
RAG IndexerEmbed knowledge pages with cb-ollama on saveOn change

Sub-Container Fleet

ContextBay does not reimplement Prometheus, Grafana, n8n, or Wazuh. Each is deployed as a Compose stack on first boot, with the master owning the lifecycle (deploy / restart / update / remove) through the Portainer API. Every stack is auto-provisioned with sane defaults so a fresh install gives you a working observability, automation, security, and AI stack with no manual setup.

StackReplacesAuto-bootstrap
cb-headscaleTailscale control planePre-auth keys minted on enroll
cb-prometheusSelf-hosted PrometheusScrape config generated; targets discovered from nodes + cb-* services
cb-alertmanagerSelf-hosted AlertmanagerRoutes back into CB via webhook receiver
cb-grafanaSelf-hosted GrafanaDatasource + starter dashboards provisioned; iframe embed for the UI
cb-n8nSelf-hosted n8nOwner account + API key bootstrapped; 36 workflows seeded; encryption key persisted
cb-wazuhSelf-hosted Wazuh managerDefault ruleset, FIM paths, vuln feed wired up
cb-lokiSelf-hosted LokiLog push endpoint reachable from every cb-* and worker
cb-tempoSelf-hosted TempoOTLP/HTTP endpoint exposed on the mesh
cb-pyroscopeSelf-hosted PyroscopeMaster pushes pprof + mutex profiles when profiling.enabled
cb-ollamaSelf-hosted OllamaAuto-pulls nomic-embed-text on first boot for RAG

Worker Node — Sentinel Agent

A worker is intentionally minimal. It is a monitoring agent; it never starts, stops, or deletes a container by itself. The Portainer Edge Agent deployed alongside it owns container lifecycle.

  • Heartbeat loop — sends health pings at the master-negotiated interval (default 30s) over the Headscale mesh
  • Metrics loop — collects and streams system + container metrics every 15 seconds
  • Census loop — periodic deep inventory: systemd units, open ports, packages, cron jobs
  • Docker events — forwards Docker daemon events so the master can reflect lifecycle changes without polling
  • Local /metrics endpoint — node-exporter compatible, scraped by cb-prometheus

Metrics Collected

System

  • CPU usage percentage
  • RAM usage (used/total, percentage)
  • Disk usage (used/total, percentage)
  • Network I/O (RX/TX bytes)
  • Load averages (1m, 5m, 15m)
  • Uptime

Per-Container

  • CPU percentage
  • Memory usage/limit
  • Network RX/TX bytes
  • Disk read/write bytes

Container Operations

All container, image, volume, network, and stack actions resolve the node's PortainerEndpointID and execute through Portainer. There is no gRPC fallback path.

Accepted exceptions: interactive exec / terminal sessions and log streaming, which are proxied over the worker's gRPC connection for latency reasons.

Networking & Onboarding

Every worker ↔ master link runs over Headscale (a self-hosted Tailscale control plane). The master holds the well-known mesh address <MASTER_MESH_IP> (and fd7a:115c:a1e0::1 on IPv6). Nothing else binds to plaintext gRPC by default.

<MASTER_MESH_IP> is the master's address on the Headscale mesh, in the 100.64.0.0/10 CGNAT range — the master is always assigned 100.64.0.1, so the shape is the same on every install.

Enrollment Flow

Worker                                   Master
  |                                          |
  |  POST /api/enroll  (LAN, plain HTTP)     |
  |  { enroll_token, shared_secret }         |
  |----------------------------------------->|
  |                                          |  validate token + secret
  |                                          |  mint Headscale pre-auth key
  |                                          |  return { master_url, headscale_url, auth_key }
  |<-----------------------------------------|
  |                                          |
  |  tsnet up + join mesh                    |
  |  becomes <MESH_IP>                       |
  |                                          |
  |  gRPC Register / Heartbeat (over mesh)   |
  |----------------------------------------->|
  |<--- HeartbeatResponse --------------------|

The Hosts page in the UI surfaces pending enrollments with a two-column layout: the install snippet (Compose / docker-run / systemd tabs, pre-filled with the master URL) on the left, and a live pairing panel on the right showing the token countdown and the recent enroll-attempt log with reason codes (bad token, stale token, IP mismatch, …).

Mesh Modes

ModeWhen to use
headscaleDefault. Required for any multi-node install. cb-headscale is auto-deployed.
directSingle-node dev only. Exposes plaintext gRPC on the LAN. Do not use for multi-node.

Database Schema

All persistence goes through the Store interface. Migrations run automatically on startup for both SQLite and PostgreSQL.

Core Entities

  • hosts — Pending and enrolled worker hosts with enroll tokens, fuse state, and Portainer endpoint IDs
  • nodes — Registered worker machines with hardware info, labels, status, heartbeat timestamps
  • enroll_attempts — Recent enroll attempts with reason codes for the UI's rejection log
  • containers — Containers synced from every Portainer endpoint with state, ports, labels
  • stacks — Compose stacks (own + Portainer-managed) with YAML, target endpoint, status
  • users — Platform users with role-based access (admin, operator, viewer)
  • api_keys — Scoped API keys stored as SHA-256 hashes

Feature Entities

  • pages / page_versions — Knowledge base Markdown vault with bidirectional links and version history
  • rag_chunks — Embeddings for RAG search, generated by cb-ollama
  • alert_rules / alerts — Rules synced into Prometheus YAML; firing alerts arrive via Alertmanager webhook
  • metrics — Time-series buffer with node_id, name, value, labels, timestamp
  • backup_jobs / backup_records — Scheduled volume backups with retention
  • security_events — Wazuh + scanner events with severity classification
  • projects / issues / sprints / dependencies — Project Planner: Kanban, Gantt, sprints, velocity
  • ai_tasks / ai_sessions / mcp_servers — Tiered AI router state, Claude Code sessions, registered MCP servers
  • audit_entries — Complete audit trail of every state-changing action

AI: Tiered Router

Two providers, one router. Cheap classification, summarisation, and embedding work goes to local Ollama (cb-ollama). Multi-step reasoning, code authoring, and tool-using sessions go to Claude via the Claude Agent SDK service (Anthropic API key required). The router picks per request based on task type and configured budget caps.

  • RAG search uses Ollama embeddings (default nomic-embed-text, auto-pulled on boot)
  • Claude sessions surface in the UI with cost-per-task and permission caps (acceptEdits, bypass, …)
  • MCP servers are first-class: the registry exposes core, n8n, Brain, and Planner tools to Claude

Security Model

Authentication

  1. Initial setup — first request to /api/auth/setup creates the admin user (one-time, then disabled)
  2. Login — username + bcrypt password returns a JWT (HS256, 24h expiry) and a session cookie
  3. API keys — prefixed with cb_, stored as SHA-256 hashes, scoped per role
  4. Worker enroll — one-time token + shared secret over plaintext /api/enroll (rate-limited 10/min/IP). After enrollment all traffic is mesh-encrypted.

Authorization (RBAC)

RolePermissions
AdminFull access. User management, settings, terminal, sub-container deploys, import/export.
OperatorCRUD resources. Container actions (via Portainer), workflow execution, backups.
ViewerRead-only access to all resources.

Open Endpoints (no auth)

  • GET /metrics — Prometheus convention. cb-prometheus scrapes the master here.
  • POST /api/enroll — Worker enrollment. Rate-limited 10/min/IP, requires shared_secret.
  • GET /api/health, GET /api/modules — health probes.
  • POST /api/auth/login, POST /api/auth/setup — bootstrap.

Audit Trail

Every state-changing operation is recorded with actor, action, target resource, detail message, IP address, and timestamp. The audit log is admin-only and exportable.