Architecture

How the master, Portainer, and the sub-container fleet are layered; how worker nodes join over an encrypted mesh; the database schema; and the security model.

The Three-Circle Model

ContextBay is structured as three concentric layers. The master container is the only thing you start by hand. It boots Portainer, and Portainer is then used to deploy the rest of the fleet as regular Compose stacks.

+-----------------------------------------------------------+
|  OUTER  contextbay-master  (Go API + embedded Next.js UI) |
|         REST :7480 / gRPC :7481 / WebSocket / Prometheus  |
|         /metrics  -- SQLite (WAL) or PostgreSQL           |
+-----------------------------------------------------------+
              |
              | docker.sock  (the only direct Docker call)
              v
+-----------------------------------------------------------+
|  INNER  cb-portainer  -- the Docker control plane         |
|         Manages every other container on every endpoint   |
+-----------------------------------------------------------+
              |
              | Portainer API (stack deploys + Edge Agents)
              v
+-----------------------------------------------------------+
|  SUB-CONTAINER FLEET  (one Compose stack per service)     |
|    cb-headscale     mesh control plane (WireGuard)        |
|    cb-prometheus    metrics scrape + PromQL               |
|    cb-alertmanager  alert routing                         |
|    cb-grafana       dashboards (auto-provisioned ds)      |
|    cb-n8n           workflow automation                   |
|    cb-wazuh         SIEM, FIM, vuln detection             |
|    cb-loki          log aggregation                       |
|    cb-tempo         distributed traces                    |
|    cb-pyroscope     continuous profiling                  |
|    cb-ollama        local LLM (auto-pulls embed model)    |
+-----------------------------------------------------------+
              ^
              | Headscale mesh (<MESH_IP>:7481, encrypted)
              |
+-----------------------------------------------------------+
|  WORKER NODES  -- Sentinel agent (monitoring only)        |
|  Heartbeat + metrics + Docker events + census             |
|  Container CRUD performed by Portainer Edge Agent         |
+-----------------------------------------------------------+

The key invariant: the master container talks to the host Docker socket exactly once — to start Portainer. Everything else is a Portainer-managed stack, which means stacks can be edited, redeployed, or removed through the same UI you already use for your own apps.

Master Container

The master is a single Go binary that embeds the Next.js frontend as static files via go:embed. On startup it orchestrates:

Load and validate configuration (TOML + env overrides)
Open database (SQLite WAL by default, PostgreSQL optional)
Connect itself to the contextbay-internal Docker network
Start Portainer (only direct Docker call) and persist the bootstrap admin password / JWT
Initialise enabled modules (monitoring, alerting, AI, security, …)
Deploy the sub-container fleet via Portainer API (Headscale, Prometheus, Grafana, n8n, Wazuh, Loki, Tempo, Pyroscope, Ollama, Alertmanager)
Auto-bootstrap n8n: create owner account, mint API key, persist encryption key, seed 36 workflows
Auto-pull the Ollama embedding model used by RAG
Start HTTP/WebSocket server, gRPC server (over the mesh), and background services
Block until SIGINT/SIGTERM, then graceful shutdown

HTTP/WebSocket Server

The API uses Go's standard net/http.ServeMux with pattern-based routing (Go 1.22+). Key components include the RBAC scope middleware (admin / operator / viewer), a WebSocket hub for real-time events, an in-memory metrics cache, and a typed event bus that bridges every state change to n8n via webhook.

gRPC Server

Two protobuf services handle worker communication. The gRPC listener is bound to the Headscale mesh interface, so workers only reach it after they have joined the mesh.

Service	RPCs
NodeService	Register, Heartbeat, ReportMetrics (streaming), ReportEvents (streaming)
CommandService	CommandStream (bidirectional) — used for log/exec session proxying. Container CRUD goes through Portainer, not gRPC.

Background Services

Service	Purpose	Interval
Portainer Edge Poller	Sync container state from every Portainer endpoint	30 seconds
Sub-container Health Monitor	Probe each cb-* stack and surface state	Continuous
Alertmanager Receiver	Convert incoming alerts into events + Grafana annotations	Per webhook
Backup Scheduler	Run cron-scheduled volume backups	Per cron schedule
AI Tiered Router	Classify with Ollama, escalate to Claude when needed	Per request
AI Session Manager	Dequeue and execute Claude tasks (max 2 concurrent)	Continuous
Stale Node Detector	Mark nodes as degraded / unreachable after missed heartbeats	Periodic
RAG Indexer	Embed knowledge pages with cb-ollama on save	On change

Sub-Container Fleet

ContextBay does not reimplement Prometheus, Grafana, n8n, or Wazuh. Each is deployed as a Compose stack on first boot, with the master owning the lifecycle (deploy / restart / update / remove) through the Portainer API. Every stack is auto-provisioned with sane defaults so a fresh install gives you a working observability, automation, security, and AI stack with no manual setup.

Stack	Replaces	Auto-bootstrap
`cb-headscale`	Tailscale control plane	Pre-auth keys minted on enroll
`cb-prometheus`	Self-hosted Prometheus	Scrape config generated; targets discovered from nodes + cb-* services
`cb-alertmanager`	Self-hosted Alertmanager	Routes back into CB via webhook receiver
`cb-grafana`	Self-hosted Grafana	Datasource + starter dashboards provisioned; iframe embed for the UI
`cb-n8n`	Self-hosted n8n	Owner account + API key bootstrapped; 36 workflows seeded; encryption key persisted
`cb-wazuh`	Self-hosted Wazuh manager	Default ruleset, FIM paths, vuln feed wired up
`cb-loki`	Self-hosted Loki	Log push endpoint reachable from every cb-* and worker
`cb-tempo`	Self-hosted Tempo	OTLP/HTTP endpoint exposed on the mesh
`cb-pyroscope`	Self-hosted Pyroscope	Master pushes pprof + mutex profiles when profiling.enabled
`cb-ollama`	Self-hosted Ollama	Auto-pulls nomic-embed-text on first boot for RAG

Worker Node — Sentinel Agent

A worker is intentionally minimal. It is a monitoring agent; it never starts, stops, or deletes a container by itself. The Portainer Edge Agent deployed alongside it owns container lifecycle.

Heartbeat loop — sends health pings at the master-negotiated interval (default 30s) over the Headscale mesh
Metrics loop — collects and streams system + container metrics every 15 seconds
Census loop — periodic deep inventory: systemd units, open ports, packages, cron jobs
Docker events — forwards Docker daemon events so the master can reflect lifecycle changes without polling
Local /metrics endpoint — node-exporter compatible, scraped by cb-prometheus

Metrics Collected

System

CPU usage percentage
RAM usage (used/total, percentage)
Disk usage (used/total, percentage)
Network I/O (RX/TX bytes)
Load averages (1m, 5m, 15m)
Uptime

Per-Container

CPU percentage
Memory usage/limit
Network RX/TX bytes
Disk read/write bytes

Container Operations

All container, image, volume, network, and stack actions resolve the node's PortainerEndpointID and execute through Portainer. There is no gRPC fallback path.

Accepted exceptions: interactive exec / terminal sessions and log streaming, which are proxied over the worker's gRPC connection for latency reasons.

Networking & Onboarding

Every worker ↔ master link runs over Headscale (a self-hosted Tailscale control plane). The master holds the well-known mesh address <MASTER_MESH_IP> (and fd7a:115c:a1e0::1 on IPv6). Nothing else binds to plaintext gRPC by default.

<MASTER_MESH_IP> is the master's address on the Headscale mesh, in the 100.64.0.0/10 CGNAT range — the master is always assigned 100.64.0.1, so the shape is the same on every install.

Enrollment Flow

Worker                                   Master
  |                                          |
  |  POST /api/enroll  (LAN, plain HTTP)     |
  |  { enroll_token, shared_secret }         |
  |----------------------------------------->|
  |                                          |  validate token + secret
  |                                          |  mint Headscale pre-auth key
  |                                          |  return { master_url, headscale_url, auth_key }
  |<-----------------------------------------|
  |                                          |
  |  tsnet up + join mesh                    |
  |  becomes <MESH_IP>                       |
  |                                          |
  |  gRPC Register / Heartbeat (over mesh)   |
  |----------------------------------------->|
  |<--- HeartbeatResponse --------------------|

The Hosts page in the UI surfaces pending enrollments with a two-column layout: the install snippet (Compose / docker-run / systemd tabs, pre-filled with the master URL) on the left, and a live pairing panel on the right showing the token countdown and the recent enroll-attempt log with reason codes (bad token, stale token, IP mismatch, …).

Mesh Modes

Mode	When to use
`headscale`	Default. Required for any multi-node install. cb-headscale is auto-deployed.
`direct`	Single-node dev only. Exposes plaintext gRPC on the LAN. Do not use for multi-node.

Database Schema

All persistence goes through the Store interface. Migrations run automatically on startup for both SQLite and PostgreSQL.

Core Entities

hosts — Pending and enrolled worker hosts with enroll tokens, fuse state, and Portainer endpoint IDs
nodes — Registered worker machines with hardware info, labels, status, heartbeat timestamps
enroll_attempts — Recent enroll attempts with reason codes for the UI's rejection log
containers — Containers synced from every Portainer endpoint with state, ports, labels
stacks — Compose stacks (own + Portainer-managed) with YAML, target endpoint, status
users — Platform users with role-based access (admin, operator, viewer)
api_keys — Scoped API keys stored as SHA-256 hashes

Feature Entities

pages / page_versions — Knowledge base Markdown vault with bidirectional links and version history
rag_chunks — Embeddings for RAG search, generated by cb-ollama
alert_rules / alerts — Rules synced into Prometheus YAML; firing alerts arrive via Alertmanager webhook
metrics — Time-series buffer with node_id, name, value, labels, timestamp
backup_jobs / backup_records — Scheduled volume backups with retention
security_events — Wazuh + scanner events with severity classification
projects / issues / sprints / dependencies — Project Planner: Kanban, Gantt, sprints, velocity
ai_tasks / ai_sessions / mcp_servers — Tiered AI router state, Claude Code sessions, registered MCP servers
audit_entries — Complete audit trail of every state-changing action

AI: Tiered Router

Two providers, one router. Cheap classification, summarisation, and embedding work goes to local Ollama (cb-ollama). Multi-step reasoning, code authoring, and tool-using sessions go to Claude via the Claude Agent SDK service (Anthropic API key required). The router picks per request based on task type and configured budget caps.

RAG search uses Ollama embeddings (default nomic-embed-text, auto-pulled on boot)
Claude sessions surface in the UI with cost-per-task and permission caps (acceptEdits, bypass, …)
MCP servers are first-class: the registry exposes core, n8n, Brain, and Planner tools to Claude

Security Model

Authentication

Initial setup — first request to /api/auth/setup creates the admin user (one-time, then disabled)
Login — username + bcrypt password returns a JWT (HS256, 24h expiry) and a session cookie
API keys — prefixed with cb_, stored as SHA-256 hashes, scoped per role
Worker enroll — one-time token + shared secret over plaintext /api/enroll (rate-limited 10/min/IP). After enrollment all traffic is mesh-encrypted.

Authorization (RBAC)

Role	Permissions
Admin	Full access. User management, settings, terminal, sub-container deploys, import/export.
Operator	CRUD resources. Container actions (via Portainer), workflow execution, backups.
Viewer	Read-only access to all resources.

Open Endpoints (no auth)

GET /metrics — Prometheus convention. cb-prometheus scrapes the master here.
POST /api/enroll — Worker enrollment. Rate-limited 10/min/IP, requires shared_secret.
GET /api/health, GET /api/modules — health probes.
POST /api/auth/login, POST /api/auth/setup — bootstrap.

Audit Trail

Every state-changing operation is recorded with actor, action, target resource, detail message, IP address, and timestamp. The audit log is admin-only and exportable.