Architecture
How the master, Portainer, and the sub-container fleet are layered; how worker nodes join over an encrypted mesh; the database schema; and the security model.
The Three-Circle Model
ContextBay is structured as three concentric layers. The master container is the only thing you start by hand. It boots Portainer, and Portainer is then used to deploy the rest of the fleet as regular Compose stacks.
+-----------------------------------------------------------+
| OUTER contextbay-master (Go API + embedded Next.js UI) |
| REST :7480 / gRPC :7481 / WebSocket / Prometheus |
| /metrics -- SQLite (WAL) or PostgreSQL |
+-----------------------------------------------------------+
|
| docker.sock (the only direct Docker call)
v
+-----------------------------------------------------------+
| INNER cb-portainer -- the Docker control plane |
| Manages every other container on every endpoint |
+-----------------------------------------------------------+
|
| Portainer API (stack deploys + Edge Agents)
v
+-----------------------------------------------------------+
| SUB-CONTAINER FLEET (one Compose stack per service) |
| cb-headscale mesh control plane (WireGuard) |
| cb-prometheus metrics scrape + PromQL |
| cb-alertmanager alert routing |
| cb-grafana dashboards (auto-provisioned ds) |
| cb-n8n workflow automation |
| cb-wazuh SIEM, FIM, vuln detection |
| cb-loki log aggregation |
| cb-tempo distributed traces |
| cb-pyroscope continuous profiling |
| cb-ollama local LLM (auto-pulls embed model) |
+-----------------------------------------------------------+
^
| Headscale mesh (<MESH_IP>:7481, encrypted)
|
+-----------------------------------------------------------+
| WORKER NODES -- Sentinel agent (monitoring only) |
| Heartbeat + metrics + Docker events + census |
| Container CRUD performed by Portainer Edge Agent |
+-----------------------------------------------------------+The key invariant: the master container talks to the host Docker socket exactly once — to start Portainer. Everything else is a Portainer-managed stack, which means stacks can be edited, redeployed, or removed through the same UI you already use for your own apps.
Master Container
The master is a single Go binary that embeds the Next.js frontend as static files via go:embed. On startup it orchestrates:
- Load and validate configuration (TOML + env overrides)
- Open database (SQLite WAL by default, PostgreSQL optional)
- Connect itself to the
contextbay-internalDocker network - Start Portainer (only direct Docker call) and persist the bootstrap admin password / JWT
- Initialise enabled modules (monitoring, alerting, AI, security, …)
- Deploy the sub-container fleet via Portainer API (Headscale, Prometheus, Grafana, n8n, Wazuh, Loki, Tempo, Pyroscope, Ollama, Alertmanager)
- Auto-bootstrap n8n: create owner account, mint API key, persist encryption key, seed 36 workflows
- Auto-pull the Ollama embedding model used by RAG
- Start HTTP/WebSocket server, gRPC server (over the mesh), and background services
- Block until SIGINT/SIGTERM, then graceful shutdown
HTTP/WebSocket Server
The API uses Go's standard net/http.ServeMux with pattern-based routing (Go 1.22+). Key components include the RBAC scope middleware (admin / operator / viewer), a WebSocket hub for real-time events, an in-memory metrics cache, and a typed event bus that bridges every state change to n8n via webhook.
gRPC Server
Two protobuf services handle worker communication. The gRPC listener is bound to the Headscale mesh interface, so workers only reach it after they have joined the mesh.
| Service | RPCs |
|---|---|
| NodeService | Register, Heartbeat, ReportMetrics (streaming), ReportEvents (streaming) |
| CommandService | CommandStream (bidirectional) — used for log/exec session proxying. Container CRUD goes through Portainer, not gRPC. |
Background Services
| Service | Purpose | Interval |
|---|---|---|
| Portainer Edge Poller | Sync container state from every Portainer endpoint | 30 seconds |
| Sub-container Health Monitor | Probe each cb-* stack and surface state | Continuous |
| Alertmanager Receiver | Convert incoming alerts into events + Grafana annotations | Per webhook |
| Backup Scheduler | Run cron-scheduled volume backups | Per cron schedule |
| AI Tiered Router | Classify with Ollama, escalate to Claude when needed | Per request |
| AI Session Manager | Dequeue and execute Claude tasks (max 2 concurrent) | Continuous |
| Stale Node Detector | Mark nodes as degraded / unreachable after missed heartbeats | Periodic |
| RAG Indexer | Embed knowledge pages with cb-ollama on save | On change |
Sub-Container Fleet
ContextBay does not reimplement Prometheus, Grafana, n8n, or Wazuh. Each is deployed as a Compose stack on first boot, with the master owning the lifecycle (deploy / restart / update / remove) through the Portainer API. Every stack is auto-provisioned with sane defaults so a fresh install gives you a working observability, automation, security, and AI stack with no manual setup.
| Stack | Replaces | Auto-bootstrap |
|---|---|---|
cb-headscale | Tailscale control plane | Pre-auth keys minted on enroll |
cb-prometheus | Self-hosted Prometheus | Scrape config generated; targets discovered from nodes + cb-* services |
cb-alertmanager | Self-hosted Alertmanager | Routes back into CB via webhook receiver |
cb-grafana | Self-hosted Grafana | Datasource + starter dashboards provisioned; iframe embed for the UI |
cb-n8n | Self-hosted n8n | Owner account + API key bootstrapped; 36 workflows seeded; encryption key persisted |
cb-wazuh | Self-hosted Wazuh manager | Default ruleset, FIM paths, vuln feed wired up |
cb-loki | Self-hosted Loki | Log push endpoint reachable from every cb-* and worker |
cb-tempo | Self-hosted Tempo | OTLP/HTTP endpoint exposed on the mesh |
cb-pyroscope | Self-hosted Pyroscope | Master pushes pprof + mutex profiles when profiling.enabled |
cb-ollama | Self-hosted Ollama | Auto-pulls nomic-embed-text on first boot for RAG |
Worker Node — Sentinel Agent
A worker is intentionally minimal. It is a monitoring agent; it never starts, stops, or deletes a container by itself. The Portainer Edge Agent deployed alongside it owns container lifecycle.
- Heartbeat loop — sends health pings at the master-negotiated interval (default 30s) over the Headscale mesh
- Metrics loop — collects and streams system + container metrics every 15 seconds
- Census loop — periodic deep inventory: systemd units, open ports, packages, cron jobs
- Docker events — forwards Docker daemon events so the master can reflect lifecycle changes without polling
- Local /metrics endpoint — node-exporter compatible, scraped by cb-prometheus
Metrics Collected
System
- CPU usage percentage
- RAM usage (used/total, percentage)
- Disk usage (used/total, percentage)
- Network I/O (RX/TX bytes)
- Load averages (1m, 5m, 15m)
- Uptime
Per-Container
- CPU percentage
- Memory usage/limit
- Network RX/TX bytes
- Disk read/write bytes
Container Operations
All container, image, volume, network, and stack actions resolve the node's PortainerEndpointID and execute through Portainer. There is no gRPC fallback path.
Accepted exceptions: interactive exec / terminal sessions and log streaming, which are proxied over the worker's gRPC connection for latency reasons.
Networking & Onboarding
Every worker ↔ master link runs over Headscale (a self-hosted Tailscale control plane). The master holds the well-known mesh address <MASTER_MESH_IP> (and fd7a:115c:a1e0::1 on IPv6). Nothing else binds to plaintext gRPC by default.
<MASTER_MESH_IP> is the master's address on the Headscale mesh, in the 100.64.0.0/10 CGNAT range — the master is always assigned 100.64.0.1, so the shape is the same on every install.
Enrollment Flow
Worker Master
| |
| POST /api/enroll (LAN, plain HTTP) |
| { enroll_token, shared_secret } |
|----------------------------------------->|
| | validate token + secret
| | mint Headscale pre-auth key
| | return { master_url, headscale_url, auth_key }
|<-----------------------------------------|
| |
| tsnet up + join mesh |
| becomes <MESH_IP> |
| |
| gRPC Register / Heartbeat (over mesh) |
|----------------------------------------->|
|<--- HeartbeatResponse --------------------|The Hosts page in the UI surfaces pending enrollments with a two-column layout: the install snippet (Compose / docker-run / systemd tabs, pre-filled with the master URL) on the left, and a live pairing panel on the right showing the token countdown and the recent enroll-attempt log with reason codes (bad token, stale token, IP mismatch, …).
Mesh Modes
| Mode | When to use |
|---|---|
headscale | Default. Required for any multi-node install. cb-headscale is auto-deployed. |
direct | Single-node dev only. Exposes plaintext gRPC on the LAN. Do not use for multi-node. |
Database Schema
All persistence goes through the Store interface. Migrations run automatically on startup for both SQLite and PostgreSQL.
Core Entities
- hosts — Pending and enrolled worker hosts with enroll tokens, fuse state, and Portainer endpoint IDs
- nodes — Registered worker machines with hardware info, labels, status, heartbeat timestamps
- enroll_attempts — Recent enroll attempts with reason codes for the UI's rejection log
- containers — Containers synced from every Portainer endpoint with state, ports, labels
- stacks — Compose stacks (own + Portainer-managed) with YAML, target endpoint, status
- users — Platform users with role-based access (admin, operator, viewer)
- api_keys — Scoped API keys stored as SHA-256 hashes
Feature Entities
- pages / page_versions — Knowledge base Markdown vault with bidirectional links and version history
- rag_chunks — Embeddings for RAG search, generated by cb-ollama
- alert_rules / alerts — Rules synced into Prometheus YAML; firing alerts arrive via Alertmanager webhook
- metrics — Time-series buffer with node_id, name, value, labels, timestamp
- backup_jobs / backup_records — Scheduled volume backups with retention
- security_events — Wazuh + scanner events with severity classification
- projects / issues / sprints / dependencies — Project Planner: Kanban, Gantt, sprints, velocity
- ai_tasks / ai_sessions / mcp_servers — Tiered AI router state, Claude Code sessions, registered MCP servers
- audit_entries — Complete audit trail of every state-changing action
AI: Tiered Router
Two providers, one router. Cheap classification, summarisation, and embedding work goes to local Ollama (cb-ollama). Multi-step reasoning, code authoring, and tool-using sessions go to Claude via the Claude Agent SDK service (Anthropic API key required). The router picks per request based on task type and configured budget caps.
- RAG search uses Ollama embeddings (default
nomic-embed-text, auto-pulled on boot) - Claude sessions surface in the UI with cost-per-task and permission caps (
acceptEdits,bypass, …) - MCP servers are first-class: the registry exposes core, n8n, Brain, and Planner tools to Claude
Security Model
Authentication
- Initial setup — first request to
/api/auth/setupcreates the admin user (one-time, then disabled) - Login — username + bcrypt password returns a JWT (HS256, 24h expiry) and a session cookie
- API keys — prefixed with
cb_, stored as SHA-256 hashes, scoped per role - Worker enroll — one-time token + shared secret over plaintext
/api/enroll(rate-limited 10/min/IP). After enrollment all traffic is mesh-encrypted.
Authorization (RBAC)
| Role | Permissions |
|---|---|
| Admin | Full access. User management, settings, terminal, sub-container deploys, import/export. |
| Operator | CRUD resources. Container actions (via Portainer), workflow execution, backups. |
| Viewer | Read-only access to all resources. |
Open Endpoints (no auth)
GET /metrics— Prometheus convention. cb-prometheus scrapes the master here.POST /api/enroll— Worker enrollment. Rate-limited 10/min/IP, requires shared_secret.GET /api/health,GET /api/modules— health probes.POST /api/auth/login,POST /api/auth/setup— bootstrap.
Audit Trail
Every state-changing operation is recorded with actor, action, target resource, detail message, IP address, and timestamp. The audit log is admin-only and exportable.

