Skip to content

Metrics & OpenTelemetry

Scion provides built-in telemetry collection via sciontool, which runs as the init process in agent containers. The telemetry pipeline acts as an OTLP Forwarder: it receives data from agents locally and forwards it to a central cloud observability backend.

  1. Agent (The Source): Emits OTLP data (traces/metrics) or harness hook events.
  2. sciontool (The Forwarder):
    • Receives OTLP via gRPC (port 4317) or HTTP (port 4318).
    • Normalizes harness hooks into standard OTLP spans.
    • Applies privacy filters (redaction/hashing).
  3. Cloud Backend (The Destination): Receives the processed telemetry from sciontool.

Telemetry is configured through settings.yaml (for global and grove-level defaults) and scion-agent.yaml (for per-template and per-agent overrides). Environment variables provide the highest-priority override.

Telemetry settings resolve across scopes using last-write-wins semantics:

  1. Global settings (~/.scion/settings.yaml) — Organization-wide defaults.
  2. Grove settings (.scion/settings.yaml) — Project-level overrides.
  3. Template config (scion-agent.yaml in template) — Role-specific overrides.
  4. Agent config (scion-agent.yaml in agent home) — Per-agent overrides.
  5. Environment variables (SCION_TELEMETRY_*, SCION_OTEL_*) — Highest priority.

At each scope, only the fields you specify are overridden; unset fields inherit from the previous scope.

The telemetry block can appear in any settings.yaml (global or grove) or scion-agent.yaml (template or agent):

# In settings.yaml or scion-agent.yaml
telemetry:
enabled: true
cloud:
enabled: true
endpoint: "monitoring.googleapis.com:443"
protocol: grpc
headers:
Authorization: "Bearer ${OTEL_API_KEY}"
tls:
enabled: true
insecure_skip_verify: false
batch:
max_size: 512
timeout: "5s"
hub:
enabled: true
report_interval: "30s"
local:
enabled: false
file: ""
console: false
filter:
enabled: true
respect_debug_mode: true
events:
include: []
exclude:
- "agent.user.prompt"
attributes:
redact:
- "prompt"
- "user.email"
- "tool_output"
hash:
- "session_id"
sampling:
default: 1.0
rates: {}
resource:
service.name: "scion-agent"

See the Orchestrator Settings Reference for the full field reference.

Environment variables override any settings file value and are the most convenient option for CI or hosted deployments.

VariableSettings PathDefaultDescription
SCION_TELEMETRY_ENABLEDtelemetry.enabledtrueEnable/disable collection entirely
SCION_TELEMETRY_CLOUD_ENABLEDtelemetry.cloud.enabledtrueEnable forwarding to cloud backend
SCION_OTEL_ENDPOINTtelemetry.cloud.endpoint(required)Cloud OTLP endpoint URL
SCION_OTEL_PROTOCOLtelemetry.cloud.protocolgrpcProtocol: grpc or http
SCION_OTEL_INSECUREtelemetry.cloud.tls.insecure_skip_verifyfalseSkip TLS verification (dev only)
SCION_TELEMETRY_HUB_ENABLEDtelemetry.hub.enabledtrueEnable Hub reporting
SCION_TELEMETRY_DEBUGtelemetry.local.enabledfalseEnable local debug output
SCION_GCP_PROJECT_ID(auto)GCP project ID for Google Cloud backends

These settings control the ports where sciontool listens for data from the agent processes inside the container.

VariableDefaultDescription
SCION_OTEL_GRPC_PORT4317Local gRPC receiver port
SCION_OTEL_HTTP_PORT4318Local HTTP receiver port

When deploying on Google Cloud, sciontool can forward directly to Cloud Trace and Cloud Logging using the standard OTLP endpoint.

Set these environment variables in your Hub settings (Grove or Broker level):

Terminal window
# Direct OTLP ingestion for Google Cloud
export SCION_OTEL_ENDPOINT="monitoring.googleapis.com:443"
export SCION_OTEL_PROTOCOL="grpc"
export SCION_GCP_PROJECT_ID="your-project-id"

If your agent harness supports native OpenTelemetry (e.g., opencode), configure it to point to the sciontool forwarder running on localhost:

Terminal window
# Tell the agent to send to sciontool
export OTEL_EXPORTER_OTLP_ENDPOINT="localhost:4317"

Note: Most standard OTel SDKs default to localhost:4317, so explicit configuration may not be required.

Ensure the environment where the agent container runs (GKE Pod, Cloud Run, etc.) has a service account with:

  • roles/logging.logWriter
  • roles/cloudtrace.agent
  • roles/monitoring.metricWriter

Scion includes a native OTel metrics pipeline that captures operational data from agent sessions. This data is recorded as counters and histograms, providing a time-series view of agent performance.

To enable harness-aware telemetry, Scion automatically injects SCION_HARNESS and SCION_MODEL environment variables into all agent containers.

All metrics and traces emitted by Scion are enriched with context-aware OpenTelemetry resource attributes to allow for precise filtering and aggregation in your cloud backend:

  • scion.harness: The type of harness running the agent (e.g., gemini, claude, codex).
  • scion.model: The specific LLM model being used.
  • scion.broker: The ID of the Runtime Broker executing the agent.
  • grove_id: The ID of the agent’s parent grove.

When harness events occur (via hooks), sciontool automatically records the following metrics:

MetricTypeUnitDescription
gen_ai.tokens.inputCountertokensNumber of input tokens processed
gen_ai.tokens.outputCountertokensNumber of output tokens generated
gen_ai.tokens.cachedCountertokensNumber of tokens retrieved from cache
agent.tool.callsCountercallsTotal number of tool executions
agent.tool.durationHistogrammsLatency of tool executions
agent.session.countCountersessionsTotal number of agent sessions
gen_ai.api.callsCountercallsTotal number of LLM API requests
gen_ai.api.durationHistogrammsLatency of LLM API requests

(Note: The Codex harness has been expanded to capture comprehensive telemetry including tool usage, detailed tool input/output, and granular token counts for input, output, and cached tokens).

For every significant lifecycle event (session start/end, tool use, model call), sciontool emits an OTel log record that is automatically correlated with the active trace. This means when viewing a trace waterfall in your observability backend (like Google Cloud Trace), you can click directly through to the specific logs associated with each span.

The Scion Hub maintains internal operational metrics for infrastructure monitoring. These are available via the /api/v1/admin/metrics endpoint (requires hub:admin role) and can be exported to standard monitoring tools.

With the introduction of GCP Identity emulation, the Hub tracks the health and performance of the token brokering pipeline:

MetricDescription
accessTokenRequestsTotal number of GCP Access Token requests from agents.
accessTokenSuccessesNumber of successfully brokered access tokens.
accessTokenFailuresNumber of failed access token requests (e.g., IAM permission errors).
idTokenRequestsTotal number of GCP Identity Token requests.
rateLimitRejectionsNumber of token requests rejected due to per-agent rate limiting.
iamLatencyP50MsMedian latency of IAM API calls to Google Cloud.
iamLatencyP95Ms95th percentile latency of IAM API calls.

Monitors the security and connectivity of Runtime Brokers:

  • authAttempts: Total broker authentication attempts.
  • connectedBrokers: Current number of active Runtime Brokers connected to the Hub.
  • dispatchFailures: Number of failed agent dispatch commands to brokers.

By default, sciontool excludes agent.user.prompt events to protect user privacy. Filtering is configured via the telemetry.filter block in settings.yaml or scion-agent.yaml, or via environment variables.

telemetry:
filter:
events:
exclude:
- "agent.user.prompt"
- "agent.tool.result"
attributes:
redact:
- "prompt"
- "user.email"
- "tool_output"
hash:
- "session_id"
sampling:
default: 1.0
rates:
"agent.tool.call": 0.5
Terminal window
# Exclude multiple event types
export SCION_TELEMETRY_FILTER_EXCLUDE="agent.user.prompt,agent.tool.result"
# Only forward specific event types
export SCION_TELEMETRY_FILTER_INCLUDE="agent.session.start,agent.session.end,agent.tool.call"

Beyond event filtering, sciontool provides field-level attribute redaction for sensitive data. This allows telemetry to flow while protecting specific values.

Redacted fields have their values replaced with [REDACTED]:

Terminal window
# Default redacted fields
export SCION_TELEMETRY_REDACT="prompt,user.email,tool_output,tool_input"

Hashed fields are replaced with their SHA256 hash, allowing correlation without exposing the original value:

Terminal window
# Default hashed fields
export SCION_TELEMETRY_HASH="session_id"

Harness hook events are automatically converted to OTLP spans:

Hook EventSpan NameAttributes
session-startagent.session.startsession_id, source
session-endagent.session.endsession_id, reason, tokens_*, duration_ms
tool-startagent.tool.calltool_name, tool_input
tool-endagent.tool.resulttool_name, success, duration_ms
prompt-submitagent.user.promptprompt
model-startgen_ai.api.requestmodel
model-endgen_ai.api.responsesuccess

For Gemini CLI agents, session-end events include aggregated metrics from the session file:

  • Token counts: tokens_input, tokens_output, tokens_cached
  • Session info: turn_count, duration_ms, model
  • Per-tool statistics: tool.<name>.calls, tool.<name>.success, tool.<name>.errors

Session files are automatically parsed from ~/.gemini/sessions/.

The telemetry pipeline is implemented in pkg/sciontool/telemetry/:

  • config.go - Configuration loading from environment variables
  • filter.go - Event type filtering (include/exclude) and attribute redaction
  • exporter.go - Cloud OTLP exporter (gRPC and HTTP)
  • receiver.go - OTLP gRPC/HTTP receiver
  • pipeline.go - Main orchestration (Start/Stop lifecycle)

Hook-to-span conversion is in pkg/sciontool/hooks/handlers/:

  • telemetry.go - TelemetryHandler for converting hooks to spans
  • Session parsing in pkg/sciontool/hooks/session/parser.go

The pipeline is integrated into the init command (cmd/sciontool/commands/init.go) and starts after user setup, before lifecycle hooks.