Skip to content

Observability

Scion provides comprehensive observability for agent containers and system components through the sciontool telemetry pipeline and OpenTelemetry log bridging. This guide covers how to monitor agent activity, collect logs, and integrate with cloud-native observability platforms like Google Cloud Logging and Trace.

Scion’s observability architecture follows a “forwarder” pattern where sciontool acts as a local collector inside each agent container, and system components (Hub and Broker) bridge their logs directly to a central backend.

┌─────────────────────────────────────────┐
│ Agent Container │
│ │
│ ┌─────────────┐ │
│ │ Agent │ OTLP (localhost:4317) │
│ │ (Claude/ │───────┐ │
│ │ Gemini) │ │ │
│ └─────────────┘ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ sciontool │ │
│ │ forwarder │ │
│ └────────┬────────┘ │
│ │ │
│ │ OTLP (Cloud) │
└───────────────────────┼────────────────┘
┌─────────────────┐
│ Cloud Backend │
│ (Logging/Trace) │
└─────────────────┘
│ OTLP (Cloud)
┌─────────┴─────────┐
│ System Logs │
│ (Hub & Broker) │
└───────────────────┘

To centralize logs and traces from all Scion components in Google Cloud, you must configure the OTLP endpoints and project identifiers.

The Scion Hub and Runtime Broker use structured logging (slog) with an OpenTelemetry bridge. To enable log forwarding to Google Cloud:

  1. Configure Environment Variables: Set the following on your Hub and Broker server processes:

    Terminal window
    # Enable OTel log forwarding
    export SCION_OTEL_LOG_ENABLED=true
    # Set the GCP OTLP endpoint (standard for Cloud Trace/Logging)
    export SCION_OTEL_ENDPOINT="monitoring.googleapis.com:443"
    # Specify your GCP Project ID
    export SCION_GCP_PROJECT_ID="your-project-id"
  2. Authentication: Ensure the service account running the Hub/Broker has the following IAM roles:

    • roles/logging.logWriter
    • roles/cloudtrace.agent
    • roles/monitoring.metricWriter

As an alternative to the OTel pipeline, the Hub and Broker can send logs directly to Google Cloud Logging using the cloud.google.com/go/logging client library. This is simpler to set up when you only need log forwarding without traces or metrics:

Terminal window
# Enable direct Cloud Logging
export SCION_CLOUD_LOGGING=true
export SCION_GCP_PROJECT_ID="your-project-id"
# Optional: customize the log name (default: "scion")
export SCION_CLOUD_LOGGING_LOG_ID="scion-hub"
scion server start --enable-hub

Both approaches can be used simultaneously — OTel for the full telemetry pipeline and Cloud Logging for direct log delivery.

Agents use sciontool as their init process, which includes an embedded OTLP forwarder. This forwarder must be configured to point to your cloud backend.

The preferred approach is to configure telemetry in settings.yaml. Settings at the global level apply to all agents; grove-level settings apply to a specific project. Templates and individual agents can further override these via their scion-agent.yaml.

# In ~/.scion/settings.yaml (global) or .scion/settings.yaml (grove)
telemetry:
enabled: true
cloud:
enabled: true
endpoint: "monitoring.googleapis.com:443"
protocol: grpc
filter:
events:
exclude:
- "agent.user.prompt"

See the Orchestrator Settings Reference for the full field reference, and Metrics & OpenTelemetry for how settings merge across scopes.

For hosted deployments, environment variables can be set at the Grove or Broker level on the Hub. These are automatically injected into every agent container.

Terminal window
SCION_OTEL_ENDPOINT="monitoring.googleapis.com:443"
SCION_GCP_PROJECT_ID="your-project-id"
SCION_TELEMETRY_ENABLED="true"

If you are using agents that natively support OpenTelemetry (like opencode), you may need to explicitly tell the agent where to find the sciontool forwarder (which is localhost from the agent’s perspective):

  • gRPC (Default): OTEL_EXPORTER_OTLP_ENDPOINT="localhost:4317"
  • HTTP: OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318"

These harness-specific env vars are injected at agent start time via the harness config’s env map and are separate from the Scion telemetry settings.

Agent logs are written to /home/scion/agent.log inside the container. The sciontool logging system writes to both stderr and this file.

Scion provides a built-in Cloud Log Viewer in the Web UI to stream agent logs in real-time. This is backed by the Hub API, which retrieves logs directly from the active runtime broker or from the persisted agent.log file, ensuring comprehensive visibility into agent execution regardless of its current state.

The sciontool utility ensures that agent.log is owned by the scion user during initialization, even if sciontool is initially run as root. The log file is created with permissive 0666 permissions to ensure multiple processes can contribute to the log stream.

  • INFO: Normal operational events
  • ERROR: Critical failures
  • DEBUG: Detailed information (enabled with SCION_DEBUG=true or SCION_LOG_LEVEL=debug)

The telemetry pipeline in sciontool collects and forwards OpenTelemetry (OTLP) data from agents. See the Metrics & OpenTelemetry guide for deep configuration details.

Data TypeSourceDescription
TracesAgent OTLPSpan data for tool calls, API requests
MetricssciontoolCounters and histograms for tokens, tools, and latency
Correlated LogssciontoolLog records linked to traces for every hook event
Hook EventsHarness hooksTool calls, prompts, model invocations converted to spans
Session MetricsGemini session filesToken counts, turn counts, tool statistics

By default, user prompts (agent.user.prompt) are excluded from telemetry to protect privacy. Additionally, sensitive attributes are automatically redacted or hashed.

  • Redacted: prompt, user.email, tool_output, tool_input
  • Hashed: session_id

HTTP requests to Hub, Broker, and Web servers are logged as a dedicated structured stream, separate from application logs. Request logs use the google.logging.type.HttpRequest format and include grove/agent IDs, a generated request ID, and trace context from incoming headers.

MethodConfiguration
FileSet SCION_SERVER_REQUEST_LOG_PATH=/path/to/requests.log
Cloud LoggingAutomatic when SCION_CLOUD_LOGGING=true — uses log name scion_request_log
StdoutDefault when running in background mode (suppressed in --foreground mode)

The middleware generates a UUID request_id for every request and captures trace headers (X-Cloud-Trace-Context, traceparent, X-Trace-ID). These IDs are automatically attached to all application logs emitted during the request via logging.Logger(ctx), enabling end-to-end correlation between the request log entry and any downstream application log entries.

Request logs appear under a separate log name from application logs:

-- All HTTP request logs
logName="projects/YOUR_PROJECT/logs/scion_request_log"
-- Slow requests (latency > 1s)
logName="projects/YOUR_PROJECT/logs/scion_request_log"
httpRequest.latency > "1s"
-- Failed requests for a specific grove
logName="projects/YOUR_PROJECT/logs/scion_request_log"
httpRequest.status >= 400
labels.grove_id = "my-grove"
-- Correlate a request with its application logs
logName="projects/YOUR_PROJECT/logs/scion" OR logName="projects/YOUR_PROJECT/logs/scion_request_log"
jsonPayload.request_id = "YOUR_REQUEST_ID"

See the Local Development Logging guide for the full field reference and file output format.

Hub and Broker logs include a subsystem attribute that identifies the internal subsystem that produced each log entry. This is separate from the top-level component field (which reflects the server mode: scion-hub, scion-broker, or scion-server) and provides finer-grained filtering.

SubsystemDescription
hub.agent-lifecycleAgent create, start, stop, delete, and state transitions
hub.authAuthentication and authorization decisions
hub.control-channelWebSocket lifecycle for broker connections
hub.messagesMessage routing from scion message to brokers
hub.notificationsEvent-driven notification dispatch and subscription matching
hub.schedulerBackground recurring and one-shot scheduled tasks
hub.env-secretsEnvironment variable and secret management
hub.templatesTemplate CRUD, hydration, and bootstrap
hub.workspaceGit worktree sync operations
hub.dispatcherHTTP agent dispatch to brokers
broker.agent-lifecycleContainer provisioning, environment resolution, template hydration
broker.control-channelBroker-side WebSocket connection to the hub
broker.messagesMessage injection into agent tmux sessions
broker.heartbeatPeriodic broker status reports to hub
broker.env-secretsBroker-side environment gathering and finalization

In combo server mode (scion-server), both hub.* and broker.* subsystem logs appear in the same stream. The dotted prefix distinguishes them without requiring separate processes.

All examples assume your logs are in the scion log name. Adjust the logName filter to match your configuration.

-- All hub logs (hub-only or combo mode)
logName="projects/YOUR_PROJECT/logs/scion"
labels.component="scion-hub"
-- All logs from combo server mode
logName="projects/YOUR_PROJECT/logs/scion"
labels.component="scion-server"
-- All hub subsystem logs
logName="projects/YOUR_PROJECT/logs/scion"
jsonPayload.subsystem =~ "^hub\."
-- All broker subsystem logs
logName="projects/YOUR_PROJECT/logs/scion"
jsonPayload.subsystem =~ "^broker\."
-- A specific subsystem
logName="projects/YOUR_PROJECT/logs/scion"
jsonPayload.subsystem = "hub.notifications"
-- All agent lifecycle events across hub and broker
logName="projects/YOUR_PROJECT/logs/scion"
jsonPayload.subsystem =~ "\.agent-lifecycle$"
-- Agent lifecycle for a specific agent
logName="projects/YOUR_PROJECT/logs/scion"
jsonPayload.subsystem =~ "\.agent-lifecycle$"
jsonPayload.agent_id = "my-agent-id"
-- Only errors in agent lifecycle
logName="projects/YOUR_PROJECT/logs/scion"
jsonPayload.subsystem =~ "\.agent-lifecycle$"
severity >= ERROR
-- All message-related logs (hub routing + broker injection)
logName="projects/YOUR_PROJECT/logs/scion"
jsonPayload.subsystem =~ "\.messages$"
-- Messages from a specific sender
logName="projects/YOUR_PROJECT/logs/scion"
jsonPayload.subsystem =~ "\.messages$"
jsonPayload.sender = "agent-slug"
-- Messages to a specific recipient in a grove
logName="projects/YOUR_PROJECT/logs/scion"
jsonPayload.subsystem =~ "\.messages$"
jsonPayload.recipient = "target-agent"
jsonPayload.grove_id = "my-grove-id"
-- All authentication and authorization events
logName="projects/YOUR_PROJECT/logs/scion"
jsonPayload.subsystem = "hub.auth"
-- Auth failures only
logName="projects/YOUR_PROJECT/logs/scion"
jsonPayload.subsystem = "hub.auth"
severity >= WARNING
-- All control channel activity (hub + broker sides)
logName="projects/YOUR_PROJECT/logs/scion"
jsonPayload.subsystem =~ "\.control-channel$"
-- Control channel errors (connectivity issues)
logName="projects/YOUR_PROJECT/logs/scion"
jsonPayload.subsystem =~ "\.control-channel$"
severity >= ERROR
-- All hub logs EXCEPT heartbeat noise
logName="projects/YOUR_PROJECT/logs/scion"
jsonPayload.subsystem =~ "^hub\."
jsonPayload.subsystem != "broker.heartbeat"
-- Only high-priority subsystems (notifications, auth, agent lifecycle)
logName="projects/YOUR_PROJECT/logs/scion"
jsonPayload.subsystem = "hub.notifications" OR
jsonPayload.subsystem = "hub.auth" OR
jsonPayload.subsystem =~ "\.agent-lifecycle$"
-- Errors across all subsystems in the last hour
logName="projects/YOUR_PROJECT/logs/scion"
jsonPayload.subsystem != ""
severity >= ERROR
timestamp >= "2026-03-03T00:00:00Z"
-- Debug-level broker logs for troubleshooting
logName="projects/YOUR_PROJECT/logs/scion"
jsonPayload.subsystem =~ "^broker\."
severity = DEBUG

Scion includes a comprehensive structured messaging pipeline that provides reliable delivery of messages to and from agents. This pipeline is fully observable:

  • Hub API Integration: Messages can be sent and retrieved via the new Hub API, allowing external systems to programmatically interact with agents.
  • Web UI “Messages” Tab: An interactive interface in the dashboard allows administrators and users to trace message flows in real-time.
  • Multi-Stage Broker Adapter: Ensures robust delivery of messages to the agent containers, including external notifications. You can monitor message flow health in logs using the hub.messages and broker.messages subsystems.

The Hub includes an automated monitoring system to detect “zombie” or stalled agents. This system tracks the heartbeat signals emitted by runtime brokers.

  • Heartbeat Timeout: If an agent stops responding and fails to emit a heartbeat within the configured StalledThreshold, it is automatically transitioned to an offline activity status.
  • Common Causes: Currently, this may be due to an agent being unable to refresh its auth token, which disconnects it from sending its heartbeat and other updates. These agents can be stopped and restarted to be provisioned with a new auth token. They should be able to refresh this token as long as they can maintain a connection to the Hub.
  • Notifications: Stalled events can trigger automated browser push notifications (by default, stalled and error states are included in the default notification triggers), proactively alerting administrators to health issues.
  • Visibility: The Web UI clearly flags offline agents with specialized status badges, ensuring they are not lost among active workloads.
  1. Verify Endpoints: Ensure SCION_OTEL_ENDPOINT is set to monitoring.googleapis.com:443.
  2. Check Permissions: Verify the Workload Identity or Service Account has roles/logging.logWriter.
  3. Inspect Agent Init: View the agent container logs (stderr) to see if sciontool reported a telemetry startup failure:
    [sciontool] ERROR: Failed to start telemetry: connection refused
  4. Network Policy: If running in Kubernetes, ensure Egress is allowed to GCP APIs.

If you see logs but they aren’t linked to traces in the Cloud Trace waterfall:

  1. Ensure the agent is using the sciontool gRPC port (4317).
  2. Verify SCION_OTEL_LOG_ENABLED=true is set on the system components.