Skip to content

Architecture Deep Dive

Scion is a container-based orchestration platform for managing concurrent LLM-based code agents. It operates in two distinct modes:

  • Solo Mode — A local-only, zero-config experience where the CLI manages agents directly via a local container runtime.
  • Hosted Mode — A distributed architecture where a centralized Hub coordinates state and dispatches work to one or more Runtime Brokers that execute agents on remote or local compute.

Both modes share the same core abstractions (Groves, Agents, Templates, Harnesses, Runtimes) but differ in where state is persisted and how lifecycle operations are routed.


Diagram

A Grove is the top-level grouping construct for agents. In Solo mode it is represented by a .scion directory on the filesystem; in Hosted mode it is a database record identified by its git remote URL.

Resolution order (Solo):

  1. Explicit --grove flag
  2. Project-level .scion directory (walking up from cwd)
  3. Global ~/.scion directory

Key properties:

  • Name: Slugified from the parent directory containing .scion.
  • Git remote (Hosted): Normalized remote URL used as a unique identifier for cross-broker grove identity.
  • Default Runtime Broker (Hosted): The broker used when creating agents without an explicit target.

Groves contain an agents/ subdirectory (gitignored) that holds per-agent state, and a templates/ directory for grove-scoped template definitions.

An Agent is an isolated container running an LLM harness. Each agent has:

ComponentDescription
Home directoryMounted at /home/<user> inside the container. Contains harness config, credentials, and a per-agent agent-info.json.
WorkspaceMounted at /workspace. Typically a dedicated git worktree to prevent merge conflicts between concurrent agents.
TemplateThe blueprint that seeded the agent’s home directory and configuration.
HarnessThe LLM-specific adapter (Claude, Gemini, OpenCode, Codex, or Generic).

Agent identity varies by mode:

FieldSolo ModeHosted Mode
NameUser-provided or auto-generatedUser-provided or auto-generated
ContainerIDAssigned by the container runtimeAssigned by the container runtime
IDNot usedUUID primary key in the Hub database
SlugNot usedURL-safe identifier (unique per grove)

Templates are configuration blueprints for agents. They define:

  • A home/ directory tree to copy into the agent’s home.
  • A scion-agent.json (or .yaml) file specifying harness type, environment variables, volumes, command arguments, model overrides, container image, and resource requirements.

Template chain: Templates support inheritance via a base field. When resolving a template, Scion walks the chain and merges configurations bottom-up (base first, then overrides).

Scopes (Hosted): Templates can be scoped as global, grove, or user, with visibility controls (private, grove, public).

A Harness encapsulates LLM-specific behavior behind a common interface (api.Harness):

type Harness interface {
Name() string
DiscoverAuth(agentHome string) AuthConfig
GetEnv(agentName, agentHome, unixUsername string, auth AuthConfig) map[string]string
GetCommand(task string, resume bool, baseArgs []string) []string
PropagateFiles(homeDir, unixUsername string, auth AuthConfig) error
GetVolumes(unixUsername string, auth AuthConfig) []VolumeMount
Provision(ctx context.Context, agentName, agentHome, agentWorkspace string) error
GetInterruptKey() string
// ... additional methods
}

Supported harnesses:

HarnessTarget ToolNotes
claudeClaude CodeAnthropic API key auth
geminiGemini CLIGoogle API key / OAuth / Vertex auth
opencodeOpenCodeOpenCode auth file
codexCodexCodex auth file
genericAny CLI toolFallback adapter

The harness factory (harness.New(name)) returns the appropriate implementation. Each harness handles:

  • Auth discovery: Locating credentials on the host.
  • Environment injection: Mapping credentials to container environment variables.
  • Command construction: Building the correct CLI invocation (e.g., claude --no-chrome --dangerously-skip-permissions <task>).
  • Provisioning hooks: Harness-specific setup during agent creation (e.g., writing config files).
  • Template seeding: Populating default template directories from embedded files (pkg/config/embeds/).

The Runtime interface abstracts container lifecycle operations:

type Runtime interface {
Name() string
Run(ctx context.Context, config RunConfig) (string, error)
Stop(ctx context.Context, id string) error
Delete(ctx context.Context, id string) error
List(ctx context.Context, labelFilter map[string]string) ([]api.AgentInfo, error)
GetLogs(ctx context.Context, id string) (string, error)
Attach(ctx context.Context, id string) error
ImageExists(ctx context.Context, image string) (bool, error)
PullImage(ctx context.Context, image string) error
Sync(ctx context.Context, id string, direction SyncDirection) error
Exec(ctx context.Context, id string, cmd []string) (string, error)
GetWorkspacePath(ctx context.Context, id string) (string, error)
}

Implementations:

RuntimePlatformSelection
AppleContainerRuntimemacOSAuto-detected when Apple container CLI is present
DockerRuntimeLinux / macOS / WindowsDefault fallback; supports remote Docker hosts via Host config
PodmanRuntimeLinux / macOSDaemonless/Rootless alternative; supports remote/machine execution
KubernetesRuntimeAny (via kubeconfig)Runs agents as Kubernetes Pods; supports namespace isolation, resource specs, and workspace sync via tar snapshots

Runtime selection is handled by the GetRuntime factory function, which resolves the runtime based on:

  1. The active profile’s runtime field in settings.yaml.
  2. OS-level auto-detection (macOS with container CLI → Apple; Linux → Podman if available, else Docker).
  3. Explicit override via CLI flags.

The Go codebase is organized into the following packages:

pkg/
├── api/ # Shared types: AgentInfo, ScionConfig, Harness, AuthConfig, etc.
├── agent/ # Agent lifecycle: Manager interface, provisioning, run, delete
├── agentcache/ # In-memory agent state caching
├── config/ # Settings, template resolution, path management, embeds
│ └── embeds/ # Embedded template files (go:embed)
├── harness/ # LLM-specific adapters (claude, gemini, opencode, codex, generic)
├── runtime/ # Container runtime abstraction (Docker, Apple, K8s)
├── hub/ # Hub API server: handlers, auth, control channel, metrics
├── hubclient/ # Go client library for the Hub REST API
├── hubsync/ # CLI-to-Hub synchronization logic
├── runtimebroker/ # Runtime Broker API server: handlers, heartbeat, auth
├── brokercredentials/ # Broker HMAC credential management
├── store/ # Persistence interface and models
│ └── sqlite/ # SQLite implementation of the Store interface
├── storage/ # Cloud object storage abstraction (GCS)
├── templatecache/ # Template download and caching for Runtime Brokers
├── transfer/ # Workspace transfer utilities (upload/download via GCS)
├── wsprotocol/ # WebSocket message types for Hub ↔ Broker communication
├── wsclient/ # WebSocket client utilities
├── k8s/ # Kubernetes client wrapper
├── credentials/ # Host credential discovery
├── daemon/ # Background daemon support
├── gcp/ # GCP-specific utilities
├── sciontool/ # Internal CLI status tool (used by agents)
├── util/ # Shared utilities (git, env expansion, file ops)
└── version/ # Build version info

The dependency graph flows strictly downward:

Diagram

Hub and Runtime Broker servers have their own entry points but reuse the same agent, runtime, and config packages.


scion start <name> --task "..." [--template claude] [--profile docker-local]
  1. Grove resolution: config.GetResolvedProjectDir() locates the .scion directory.
  2. Settings loading: config.LoadSettings() reads settings.yaml from the grove, merging with environment variable overrides.
  3. Provisioning (agent.ProvisionAgent): a. Creates agents/<name>/home/ and agents/<name>/workspace/ directories. b. Resolves the template chain and copies home directory contents. c. Merges configuration: template base → template → settings (harness/profile) → agent overrides. d. Creates a git worktree at agents/<name>/workspace/ on a new branch (slugified agent name). e. Runs harness-specific provisioning (harness.Provision()). f. Writes scion-agent.json and agent-info.json.
  4. Image resolution: Resolves the container image from settings/template/CLI override. Pulls if not present.
  5. Container launch (runtime.Run): a. Builds container run arguments (volumes, env vars, labels, resource limits). b. Mounts the agent home at /home/<user> and workspace at /workspace. c. If tmux is enabled, wraps the harness command in a tmux session named scion. d. Launches the container in detached mode.
  6. Status update: Writes agent-info.json with status running.
scion start <name> --task "..." --hub
  1. Hub sync: The CLI registers/syncs the grove with the Hub if not already registered.
  2. API call: The CLI sends a POST /api/v1/groves/{groveId}/agents request to the Hub.
  3. Broker selection: The Hub selects a Runtime Broker (explicit or grove default).
  4. Environment resolution: The Hub merges environment variables and secrets from all applicable scopes (user → grove → broker).
  5. Template hydration: The Hub resolves the template, attaches its content hash for broker-side caching.
  6. Dispatch: The Hub dispatches the creation request to the selected Runtime Broker via:
    • Direct HTTP if the broker has a reachable endpoint.
    • Control Channel (WebSocket tunnel) if the broker is behind a NAT/firewall.
  7. Broker execution: The Runtime Broker provisions and starts the agent using the same agent.Manager and runtime.Runtime code path as Solo mode.
  8. Status reporting: The broker reports status back to the Hub via heartbeats and agent status updates.

The Hub (pkg/hub) is a stateful API server that provides centralized management for the distributed architecture.

ComponentResponsibility
ServerHTTP server, route registration, middleware stack
StorePersistence interface (currently backed by SQLite)
ControlChannelManagerManages WebSocket connections from Runtime Brokers
HTTPDispatcherForwards agent lifecycle requests to brokers via HTTP
MetricsRuntime metrics collection (agent counts, broker health)
AuthMiddlewareJWT-based user auth, dev auth, broker HMAC auth

The Hub exposes a RESTful API under /api/v1/:

ResourceEndpoints
AgentsGET/POST /agents, GET/PUT/DELETE /agents/{id}, POST /agents/{id}/{action}
GrovesGET/POST /groves, POST /groves/register, GET/PUT/DELETE /groves/{id}, nested agent/env/secret routes
Runtime BrokersGET/POST /runtime-brokers, GET/PUT/DELETE /runtime-brokers/{id}, heartbeat, control channel
TemplatesGET/POST /templates, GET/PUT/DELETE /templates/{id}
UsersGET/POST /users, GET/PUT/DELETE /users/{id}
AuthLogin, token, refresh, validate, logout, CLI OAuth, API keys
Env Vars / SecretsCRUD for scoped environment variables and encrypted secrets
Groups / PoliciesRBAC: groups with nested membership, policies with conditional bindings

The Hub supports multiple authentication methods:

MethodUse Case
OAuthProduction user authentication via external identity providers
Dev AuthDevelopment shortcut using a static token
JWT (User)Issued after login; used for API calls
JWT (Agent)Scoped tokens issued to agents for Hub API access from within containers
API KeysProgrammatic access with sk_live_ prefixed keys
HMACRuntime Broker authentication using shared secrets

The Store interface (pkg/store/store.go) defines a comprehensive persistence contract composed of sub-interfaces:

  • AgentStore — CRUD + status updates with optimistic locking (StateVersion)
  • GroveStore — CRUD + lookup by slug, git remote
  • RuntimeBrokerStore — CRUD + heartbeat updates
  • TemplateStore — CRUD with scope and harness filtering
  • UserStore — CRUD with role and status filtering
  • GroveProviderStore — Grove-to-broker relationship management
  • EnvVarStore / SecretStore — Scoped key-value storage (encrypted for secrets)
  • GroupStore / PolicyStore — RBAC with nested group support and policy bindings
  • APIKeyStore / BrokerSecretStore — Authentication credential management

The current implementation uses SQLite (pkg/store/sqlite/). The interface is designed to support alternative backends (PostgreSQL, Firestore, etc.).


The Runtime Broker (pkg/runtimebroker) is a compute node that executes agents on behalf of the Hub.

  • Exposes a REST API for agent lifecycle operations (create, start, stop, delete, list, message, exec).
  • Manages a local agent.Manager backed by a runtime.Runtime.
  • Reports health via periodic heartbeats to the Hub.
  • Maintains a WebSocket control channel for NAT traversal (the Hub tunnels HTTP requests through the WebSocket when direct connectivity is unavailable).
  • Caches templates locally via templatecache to avoid repeated downloads.
  • Authenticates Hub requests using HMAC shared secrets.
  • Supports dynamic credential reload (watches for credential file changes).
Diagram

The control channel uses a custom WebSocket protocol (pkg/wsprotocol) with the following message types:

TypeDirectionPurpose
connectBroker → HubInitiate connection with broker metadata
connectedHub → BrokerConfirm connection
requestHub → BrokerTunnel an HTTP request through the WebSocket
responseBroker → HubReturn the HTTP response
stream_open/closeBidirectionalOpen/close streams (PTY, logs, events)
eventBroker → HubAsync events (heartbeat, agent status)
ping/pongBidirectionalKeepalive
  1. Admin creates a broker record in the Hub: POST /api/v1/runtime-brokers.
  2. Hub generates a short-lived join token: POST /api/v1/brokers/join.
  3. Broker uses the join token to obtain HMAC credentials: POST /api/v1/brokers/join (with token).
  4. Broker stores credentials locally (~/.scion/broker-credentials.json).
  5. Broker authenticates subsequent requests using HMAC-SHA256 signatures.

Configuration is managed by pkg/config and uses a layered resolution strategy.

Located at .scion/settings.yaml (YAML preferred) or .scion/settings.json (JSONC). Key sections:

active_profile: docker-local
default_template: claude
hub:
enabled: true
endpoint: https://hub.example.com
groveId: "uuid__slug"
runtimes:
docker:
host: "" # Remote Docker host (optional)
tmux: true
kubernetes:
namespace: scion-agents
sync: tar
harnesses:
claude:
image: claude-code-sandbox:latest
user: scion
gemini:
image: gemini-cli-sandbox:latest
user: gemini
profiles:
docker-local:
runtime: docker
resources:
requests: { cpu: "500m", memory: "512Mi" }
harness_overrides:
claude:
image: claude-code-sandbox:tmux

Configuration values are resolved in this order (highest priority wins):

  1. CLI flags (e.g., --image, --profile)
  2. Agent-level scion-agent.json (per-agent overrides)
  3. Template chain (merged bottom-up from base to leaf)
  4. Settings file (profile → harness → runtime)
  5. Environment variables (SCION_ prefix for settings overrides; $VAR substitution in values)
  6. Embedded defaults (pkg/config/embeds/default_settings.yaml)

Scion uses git worktrees for workspace isolation:

  1. When an agent starts in a git repository, a worktree is created at agents/<name>/workspace/ on a new branch.
  2. The branch name defaults to the slugified agent name.
  3. If a branch already exists and has a worktree, Scion reuses it (with a warning).
  4. On deletion, the worktree and optionally the branch are cleaned up.
  5. For non-git projects or explicit --workspace paths, the directory is bind-mounted directly.

The web dashboard (web/) provides a visual interface for Hosted mode operations.

LayerTechnologyLocation
Client SPALit + TypeScript + Viteweb/src/client/
ServerGo (consolidated into the scion binary)pkg/hub/web.go

The server layer (enabled via --enable-web) provides:

  • Static asset serving and SPA shell rendering.
  • OAuth authentication and session management.
  • SSE real-time event streaming via pkg/hub/events.go.
  • API routing to the Hub.

Each agent runs in its own container with:

  • A dedicated home directory (no shared state between agents).
  • An isolated git worktree (no merge conflicts).
  • Environment variables injected at container creation time (credentials are not written to disk in the grove).
Diagram

In Hosted mode, credentials can also be:

  • Stored as encrypted secrets in the Hub (scoped to user, grove, or broker).
  • Resolved and injected by the Hub at dispatch time (ResolvedEnv in the create request).

When a grove lives inside a git repository, Scion requires that agents/ is listed in .gitignore to prevent accidental credential or state leakage:

security error: '<path>/agents/' must be in .gitignore when using a project-local grove
Diagram

The following sequence traces a complete agent creation through the Hosted architecture:

Diagram

Agents write status to a file inside the container (e.g., /home/<user>/.gemini-status.json for Gemini) which is read by the CLI:

StatusMeaning
STARTINGContainer is initializing
THINKINGLLM is processing
EXECUTINGAgent is running a tool/command
WAITING_FOR_INPUTHuman-in-the-loop required (scion attach)
COMPLETEDTask finished
ERRORUnrecoverable failure

The Hub exposes a /metrics endpoint with runtime statistics:

  • Connected broker count
  • Active agent count
  • Grove count

Both the Hub and Runtime Broker use structured logging via Go’s slog package, with support for trace ID propagation (X-Cloud-Trace-Context).


Solo and Hosted modes share the same agent.Manager, runtime.Runtime, and harness.Harness implementations. The Runtime Broker does not contain a separate agent management stack; it wraps the same AgentManager that the CLI uses locally.

The control channel enables the Hub to dispatch operations to brokers behind NATs or firewalls without requiring inbound connectivity. The broker initiates the WebSocket connection and the Hub tunnels HTTP requests through it.

Templates use a chain-based merge strategy rather than flat overrides. This allows organizations to define a base template (e.g., common .bashrc, shared tooling) and layer harness-specific or project-specific customizations on top.

Agent records in the Hub use a StateVersion field for optimistic concurrency control. Updates that don’t match the expected version are rejected with ErrVersionConflict, preventing lost updates from concurrent broker status reports.

In Solo mode, the filesystem (agents/<name>/scion-agent.json, agent-info.json) is the only source of truth. There is no local database. This keeps Solo mode truly zero-config.