Observability
Scion provides comprehensive observability for agent containers and system components through the sciontool telemetry pipeline and OpenTelemetry log bridging. This guide covers how to monitor agent activity, collect logs, and integrate with cloud-native observability platforms like Google Cloud Logging and Trace.
Architecture Overview
Section titled “Architecture Overview”Scion’s observability architecture follows a “forwarder” pattern where sciontool acts as a local collector inside each agent container, and system components (Hub and Broker) bridge their logs directly to a central backend.
┌─────────────────────────────────────────┐│ Agent Container ││ ││ ┌─────────────┐ ││ │ Agent │ OTLP (localhost:4317) ││ │ (Claude/ │───────┐ ││ │ Gemini) │ │ ││ └─────────────┘ │ ││ ▼ ││ ┌─────────────────┐ ││ │ sciontool │ ││ │ forwarder │ ││ └────────┬────────┘ ││ │ ││ │ OTLP (Cloud) │└───────────────────────┼────────────────┘ │ ▼ ┌─────────────────┐ │ Cloud Backend │ │ (Logging/Trace) │ └─────────────────┘ ▲ │ OTLP (Cloud) ┌─────────┴─────────┐ │ System Logs │ │ (Hub & Broker) │ └───────────────────┘Administrator Setup: Cloud Logging
Section titled “Administrator Setup: Cloud Logging”To centralize logs and traces from all Scion components in Google Cloud, you must configure the OTLP endpoints and project identifiers.
Connecting Hub and Broker Logs
Section titled “Connecting Hub and Broker Logs”The Scion Hub and Runtime Broker use structured logging (slog) with an OpenTelemetry bridge. To enable log forwarding to Google Cloud:
-
Configure Environment Variables: Set the following on your Hub and Broker server processes:
Terminal window # Enable OTel log forwardingexport SCION_OTEL_LOG_ENABLED=true# Set the GCP OTLP endpoint (standard for Cloud Trace/Logging)export SCION_OTEL_ENDPOINT="monitoring.googleapis.com:443"# Specify your GCP Project IDexport SCION_GCP_PROJECT_ID="your-project-id" -
Authentication: Ensure the service account running the Hub/Broker has the following IAM roles:
roles/logging.logWriterroles/cloudtrace.agentroles/monitoring.metricWriter
Direct Cloud Logging (Alternative)
Section titled “Direct Cloud Logging (Alternative)”As an alternative to the OTel pipeline, the Hub and Broker can send logs directly to Google Cloud Logging using the cloud.google.com/go/logging client library. This is simpler to set up when you only need log forwarding without traces or metrics:
# Enable direct Cloud Loggingexport SCION_CLOUD_LOGGING=trueexport SCION_GCP_PROJECT_ID="your-project-id"
# Optional: customize the log name (default: "scion")export SCION_CLOUD_LOGGING_LOG_ID="scion-hub"
scion server start --enable-hubBoth approaches can be used simultaneously — OTel for the full telemetry pipeline and Cloud Logging for direct log delivery.
Configuring Agent Telemetry
Section titled “Configuring Agent Telemetry”Agents use sciontool as their init process, which includes an embedded OTLP forwarder. This forwarder must be configured to point to your cloud backend.
Via Settings File (Recommended)
Section titled “Via Settings File (Recommended)”The preferred approach is to configure telemetry in settings.yaml. Settings at the global level apply to all agents; grove-level settings apply to a specific project. Templates and individual agents can further override these via their scion-agent.yaml.
# In ~/.scion/settings.yaml (global) or .scion/settings.yaml (grove)telemetry: enabled: true cloud: enabled: true endpoint: "monitoring.googleapis.com:443" protocol: grpc filter: events: exclude: - "agent.user.prompt"See the Orchestrator Settings Reference for the full field reference, and Metrics & OpenTelemetry for how settings merge across scopes.
Via Hub Environment Variables
Section titled “Via Hub Environment Variables”For hosted deployments, environment variables can be set at the Grove or Broker level on the Hub. These are automatically injected into every agent container.
SCION_OTEL_ENDPOINT="monitoring.googleapis.com:443"SCION_GCP_PROJECT_ID="your-project-id"SCION_TELEMETRY_ENABLED="true"Harness-Specific Configuration
Section titled “Harness-Specific Configuration”If you are using agents that natively support OpenTelemetry (like opencode), you may need to explicitly tell the agent where to find the sciontool forwarder (which is localhost from the agent’s perspective):
- gRPC (Default):
OTEL_EXPORTER_OTLP_ENDPOINT="localhost:4317" - HTTP:
OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318"
These harness-specific env vars are injected at agent start time via the harness config’s env map and are separate from the Scion telemetry settings.
Agent Logs
Section titled “Agent Logs”Agent logs are written to /home/scion/agent.log inside the container. The sciontool logging system writes to both stderr and this file.
Cloud Log Viewer & Hub API
Section titled “Cloud Log Viewer & Hub API”Scion provides a built-in Cloud Log Viewer in the Web UI to stream agent logs in real-time. This is backed by the Hub API, which retrieves logs directly from the active runtime broker or from the persisted agent.log file, ensuring comprehensive visibility into agent execution regardless of its current state.
Log Ownership and Permissions
Section titled “Log Ownership and Permissions”The sciontool utility ensures that agent.log is owned by the scion user during initialization, even if sciontool is initially run as root. The log file is created with permissive 0666 permissions to ensure multiple processes can contribute to the log stream.
Log Levels
Section titled “Log Levels”- INFO: Normal operational events
- ERROR: Critical failures
- DEBUG: Detailed information (enabled with
SCION_DEBUG=trueorSCION_LOG_LEVEL=debug)
Telemetry Collection
Section titled “Telemetry Collection”The telemetry pipeline in sciontool collects and forwards OpenTelemetry (OTLP) data from agents. See the Metrics & OpenTelemetry guide for deep configuration details.
What’s Collected
Section titled “What’s Collected”| Data Type | Source | Description |
|---|---|---|
| Traces | Agent OTLP | Span data for tool calls, API requests |
| Metrics | sciontool | Counters and histograms for tokens, tools, and latency |
| Correlated Logs | sciontool | Log records linked to traces for every hook event |
| Hook Events | Harness hooks | Tool calls, prompts, model invocations converted to spans |
| Session Metrics | Gemini session files | Token counts, turn counts, tool statistics |
Privacy Controls
Section titled “Privacy Controls”By default, user prompts (agent.user.prompt) are excluded from telemetry to protect privacy. Additionally, sensitive attributes are automatically redacted or hashed.
- Redacted:
prompt,user.email,tool_output,tool_input - Hashed:
session_id
HTTP Request Logs
Section titled “HTTP Request Logs”HTTP requests to Hub, Broker, and Web servers are logged as a dedicated structured stream, separate from application logs. Request logs use the google.logging.type.HttpRequest format and include grove/agent IDs, a generated request ID, and trace context from incoming headers.
Enabling Request Log Output
Section titled “Enabling Request Log Output”| Method | Configuration |
|---|---|
| File | Set SCION_SERVER_REQUEST_LOG_PATH=/path/to/requests.log |
| Cloud Logging | Automatic when SCION_CLOUD_LOGGING=true — uses log name scion_request_log |
| Stdout | Default when running in background mode (suppressed in --foreground mode) |
Trace Context Propagation
Section titled “Trace Context Propagation”The middleware generates a UUID request_id for every request and captures trace headers (X-Cloud-Trace-Context, traceparent, X-Trace-ID). These IDs are automatically attached to all application logs emitted during the request via logging.Logger(ctx), enabling end-to-end correlation between the request log entry and any downstream application log entries.
Cloud Logging Queries
Section titled “Cloud Logging Queries”Request logs appear under a separate log name from application logs:
-- All HTTP request logslogName="projects/YOUR_PROJECT/logs/scion_request_log"
-- Slow requests (latency > 1s)logName="projects/YOUR_PROJECT/logs/scion_request_log"httpRequest.latency > "1s"
-- Failed requests for a specific grovelogName="projects/YOUR_PROJECT/logs/scion_request_log"httpRequest.status >= 400labels.grove_id = "my-grove"
-- Correlate a request with its application logslogName="projects/YOUR_PROJECT/logs/scion" OR logName="projects/YOUR_PROJECT/logs/scion_request_log"jsonPayload.request_id = "YOUR_REQUEST_ID"See the Local Development Logging guide for the full field reference and file output format.
Querying Logs by Subsystem
Section titled “Querying Logs by Subsystem”Hub and Broker logs include a subsystem attribute that identifies the internal subsystem that produced each log entry. This is separate from the top-level component field (which reflects the server mode: scion-hub, scion-broker, or scion-server) and provides finer-grained filtering.
Available Subsystems
Section titled “Available Subsystems”| Subsystem | Description |
|---|---|
hub.agent-lifecycle | Agent create, start, stop, delete, and state transitions |
hub.auth | Authentication and authorization decisions |
hub.control-channel | WebSocket lifecycle for broker connections |
hub.messages | Message routing from scion message to brokers |
hub.notifications | Event-driven notification dispatch and subscription matching |
hub.scheduler | Background recurring and one-shot scheduled tasks |
hub.env-secrets | Environment variable and secret management |
hub.templates | Template CRUD, hydration, and bootstrap |
hub.workspace | Git worktree sync operations |
hub.dispatcher | HTTP agent dispatch to brokers |
broker.agent-lifecycle | Container provisioning, environment resolution, template hydration |
broker.control-channel | Broker-side WebSocket connection to the hub |
broker.messages | Message injection into agent tmux sessions |
broker.heartbeat | Periodic broker status reports to hub |
broker.env-secrets | Broker-side environment gathering and finalization |
In combo server mode (scion-server), both hub.* and broker.* subsystem logs appear in the same stream. The dotted prefix distinguishes them without requiring separate processes.
Cloud Logging Query Examples
Section titled “Cloud Logging Query Examples”All examples assume your logs are in the scion log name. Adjust the logName filter to match your configuration.
Filter by Server Component
Section titled “Filter by Server Component”-- All hub logs (hub-only or combo mode)logName="projects/YOUR_PROJECT/logs/scion"labels.component="scion-hub"
-- All logs from combo server modelogName="projects/YOUR_PROJECT/logs/scion"labels.component="scion-server"Filter by Subsystem
Section titled “Filter by Subsystem”-- All hub subsystem logslogName="projects/YOUR_PROJECT/logs/scion"jsonPayload.subsystem =~ "^hub\."
-- All broker subsystem logslogName="projects/YOUR_PROJECT/logs/scion"jsonPayload.subsystem =~ "^broker\."
-- A specific subsystemlogName="projects/YOUR_PROJECT/logs/scion"jsonPayload.subsystem = "hub.notifications"Agent Lifecycle Debugging
Section titled “Agent Lifecycle Debugging”-- All agent lifecycle events across hub and brokerlogName="projects/YOUR_PROJECT/logs/scion"jsonPayload.subsystem =~ "\.agent-lifecycle$"
-- Agent lifecycle for a specific agentlogName="projects/YOUR_PROJECT/logs/scion"jsonPayload.subsystem =~ "\.agent-lifecycle$"jsonPayload.agent_id = "my-agent-id"
-- Only errors in agent lifecyclelogName="projects/YOUR_PROJECT/logs/scion"jsonPayload.subsystem =~ "\.agent-lifecycle$"severity >= ERRORMessage Tracing
Section titled “Message Tracing”-- All message-related logs (hub routing + broker injection)logName="projects/YOUR_PROJECT/logs/scion"jsonPayload.subsystem =~ "\.messages$"
-- Messages from a specific senderlogName="projects/YOUR_PROJECT/logs/scion"jsonPayload.subsystem =~ "\.messages$"jsonPayload.sender = "agent-slug"
-- Messages to a specific recipient in a grovelogName="projects/YOUR_PROJECT/logs/scion"jsonPayload.subsystem =~ "\.messages$"jsonPayload.recipient = "target-agent"jsonPayload.grove_id = "my-grove-id"Auth and Security Auditing
Section titled “Auth and Security Auditing”-- All authentication and authorization eventslogName="projects/YOUR_PROJECT/logs/scion"jsonPayload.subsystem = "hub.auth"
-- Auth failures onlylogName="projects/YOUR_PROJECT/logs/scion"jsonPayload.subsystem = "hub.auth"severity >= WARNINGControl Channel Monitoring
Section titled “Control Channel Monitoring”-- All control channel activity (hub + broker sides)logName="projects/YOUR_PROJECT/logs/scion"jsonPayload.subsystem =~ "\.control-channel$"
-- Control channel errors (connectivity issues)logName="projects/YOUR_PROJECT/logs/scion"jsonPayload.subsystem =~ "\.control-channel$"severity >= ERROROperational Noise Reduction
Section titled “Operational Noise Reduction”-- All hub logs EXCEPT heartbeat noiselogName="projects/YOUR_PROJECT/logs/scion"jsonPayload.subsystem =~ "^hub\."jsonPayload.subsystem != "broker.heartbeat"
-- Only high-priority subsystems (notifications, auth, agent lifecycle)logName="projects/YOUR_PROJECT/logs/scion"jsonPayload.subsystem = "hub.notifications" ORjsonPayload.subsystem = "hub.auth" ORjsonPayload.subsystem =~ "\.agent-lifecycle$"Combining with Time and Severity
Section titled “Combining with Time and Severity”-- Errors across all subsystems in the last hourlogName="projects/YOUR_PROJECT/logs/scion"jsonPayload.subsystem != ""severity >= ERRORtimestamp >= "2026-03-03T00:00:00Z"
-- Debug-level broker logs for troubleshootinglogName="projects/YOUR_PROJECT/logs/scion"jsonPayload.subsystem =~ "^broker\."severity = DEBUGStructured Messaging Pipeline
Section titled “Structured Messaging Pipeline”Scion includes a comprehensive structured messaging pipeline that provides reliable delivery of messages to and from agents. This pipeline is fully observable:
- Hub API Integration: Messages can be sent and retrieved via the new Hub API, allowing external systems to programmatically interact with agents.
- Web UI “Messages” Tab: An interactive interface in the dashboard allows administrators and users to trace message flows in real-time.
- Multi-Stage Broker Adapter: Ensures robust delivery of messages to the agent containers, including external notifications. You can monitor message flow health in logs using the
hub.messagesandbroker.messagessubsystems.
Stalled Agent Detection
Section titled “Stalled Agent Detection”The Hub includes an automated monitoring system to detect “zombie” or stalled agents. This system tracks the heartbeat signals emitted by runtime brokers.
- Heartbeat Timeout: If an agent stops responding and fails to emit a heartbeat within the configured
StalledThreshold, it is automatically transitioned to anofflineactivity status. - Common Causes: Currently, this may be due to an agent being unable to refresh its auth token, which disconnects it from sending its heartbeat and other updates. These agents can be stopped and restarted to be provisioned with a new auth token. They should be able to refresh this token as long as they can maintain a connection to the Hub.
- Notifications: Stalled events can trigger automated browser push notifications (by default,
stalledanderrorstates are included in the default notification triggers), proactively alerting administrators to health issues. - Visibility: The Web UI clearly flags offline agents with specialized status badges, ensuring they are not lost among active workloads.
Troubleshooting for Admins
Section titled “Troubleshooting for Admins”Logs Not Appearing in GCP
Section titled “Logs Not Appearing in GCP”- Verify Endpoints: Ensure
SCION_OTEL_ENDPOINTis set tomonitoring.googleapis.com:443. - Check Permissions: Verify the Workload Identity or Service Account has
roles/logging.logWriter. - Inspect Agent Init: View the agent container logs (stderr) to see if
sciontoolreported a telemetry startup failure:[sciontool] ERROR: Failed to start telemetry: connection refused - Network Policy: If running in Kubernetes, ensure Egress is allowed to GCP APIs.
Missing Trace Correlation
Section titled “Missing Trace Correlation”If you see logs but they aren’t linked to traces in the Cloud Trace waterfall:
- Ensure the agent is using the
sciontoolgRPC port (4317). - Verify
SCION_OTEL_LOG_ENABLED=trueis set on the system components.
Related Guides
Section titled “Related Guides”- Metrics & OpenTelemetry - Detailed telemetry configuration
- Hub Server - Hub integration for hosted mode
- Runtime Broker - Broker setup and configuration