Metrics & OpenTelemetry
Scion provides built-in telemetry collection via sciontool, which runs as the init process in agent containers. The telemetry pipeline acts as an OTLP Forwarder: it receives data from agents locally and forwards it to a central cloud observability backend.
Telemetry Flow
Section titled “Telemetry Flow”- Agent (The Source): Emits OTLP data (traces/metrics) or harness hook events.
- sciontool (The Forwarder):
- Receives OTLP via gRPC (port 4317) or HTTP (port 4318).
- Normalizes harness hooks into standard OTLP spans.
- Applies privacy filters (redaction/hashing).
- Cloud Backend (The Destination): Receives the processed telemetry from
sciontool.
Configuration
Section titled “Configuration”Telemetry is configured through settings.yaml (for global and grove-level defaults) and scion-agent.yaml (for per-template and per-agent overrides). Environment variables provide the highest-priority override.
Configuration Hierarchy
Section titled “Configuration Hierarchy”Telemetry settings resolve across scopes using last-write-wins semantics:
- Global settings (
~/.scion/settings.yaml) — Organization-wide defaults. - Grove settings (
.scion/settings.yaml) — Project-level overrides. - Template config (
scion-agent.yamlin template) — Role-specific overrides. - Agent config (
scion-agent.yamlin agent home) — Per-agent overrides. - Environment variables (
SCION_TELEMETRY_*,SCION_OTEL_*) — Highest priority.
At each scope, only the fields you specify are overridden; unset fields inherit from the previous scope.
Settings File Configuration
Section titled “Settings File Configuration”The telemetry block can appear in any settings.yaml (global or grove) or scion-agent.yaml (template or agent):
# In settings.yaml or scion-agent.yamltelemetry: enabled: true
cloud: enabled: true endpoint: "monitoring.googleapis.com:443" protocol: grpc headers: Authorization: "Bearer ${OTEL_API_KEY}" tls: enabled: true insecure_skip_verify: false batch: max_size: 512 timeout: "5s"
hub: enabled: true report_interval: "30s"
local: enabled: false file: "" console: false
filter: enabled: true respect_debug_mode: true events: include: [] exclude: - "agent.user.prompt" attributes: redact: - "prompt" - "user.email" - "tool_output" hash: - "session_id" sampling: default: 1.0 rates: {}
resource: service.name: "scion-agent"See the Orchestrator Settings Reference for the full field reference.
Environment Variable Overrides
Section titled “Environment Variable Overrides”Environment variables override any settings file value and are the most convenient option for CI or hosted deployments.
| Variable | Settings Path | Default | Description |
|---|---|---|---|
SCION_TELEMETRY_ENABLED | telemetry.enabled | true | Enable/disable collection entirely |
SCION_TELEMETRY_CLOUD_ENABLED | telemetry.cloud.enabled | true | Enable forwarding to cloud backend |
SCION_OTEL_ENDPOINT | telemetry.cloud.endpoint | (required) | Cloud OTLP endpoint URL |
SCION_OTEL_PROTOCOL | telemetry.cloud.protocol | grpc | Protocol: grpc or http |
SCION_OTEL_INSECURE | telemetry.cloud.tls.insecure_skip_verify | false | Skip TLS verification (dev only) |
SCION_TELEMETRY_HUB_ENABLED | telemetry.hub.enabled | true | Enable Hub reporting |
SCION_TELEMETRY_DEBUG | telemetry.local.enabled | false | Enable local debug output |
SCION_GCP_PROJECT_ID | — | (auto) | GCP project ID for Google Cloud backends |
Local Receiver Settings (For Agents)
Section titled “Local Receiver Settings (For Agents)”These settings control the ports where sciontool listens for data from the agent processes inside the container.
| Variable | Default | Description |
|---|---|---|
SCION_OTEL_GRPC_PORT | 4317 | Local gRPC receiver port |
SCION_OTEL_HTTP_PORT | 4318 | Local HTTP receiver port |
Google Cloud Setup (Recommended)
Section titled “Google Cloud Setup (Recommended)”When deploying on Google Cloud, sciontool can forward directly to Cloud Trace and Cloud Logging using the standard OTLP endpoint.
1. Configure the Forwarder
Section titled “1. Configure the Forwarder”Set these environment variables in your Hub settings (Grove or Broker level):
# Direct OTLP ingestion for Google Cloudexport SCION_OTEL_ENDPOINT="monitoring.googleapis.com:443"export SCION_OTEL_PROTOCOL="grpc"export SCION_GCP_PROJECT_ID="your-project-id"2. Configure the Agent (Native OTel)
Section titled “2. Configure the Agent (Native OTel)”If your agent harness supports native OpenTelemetry (e.g., opencode), configure it to point to the sciontool forwarder running on localhost:
# Tell the agent to send to sciontoolexport OTEL_EXPORTER_OTLP_ENDPOINT="localhost:4317"Note: Most standard OTel SDKs default to localhost:4317, so explicit configuration may not be required.
3. IAM Permissions
Section titled “3. IAM Permissions”Ensure the environment where the agent container runs (GKE Pod, Cloud Run, etc.) has a service account with:
roles/logging.logWriterroles/cloudtrace.agentroles/monitoring.metricWriter
Native Metrics Pipeline
Section titled “Native Metrics Pipeline”Scion includes a native OTel metrics pipeline that captures operational data from agent sessions. This data is recorded as counters and histograms, providing a time-series view of agent performance.
To enable harness-aware telemetry, Scion automatically injects SCION_HARNESS and SCION_MODEL environment variables into all agent containers.
Enriched Resource Attributes
Section titled “Enriched Resource Attributes”All metrics and traces emitted by Scion are enriched with context-aware OpenTelemetry resource attributes to allow for precise filtering and aggregation in your cloud backend:
scion.harness: The type of harness running the agent (e.g.,gemini,claude,codex).scion.model: The specific LLM model being used.scion.broker: The ID of the Runtime Broker executing the agent.grove_id: The ID of the agent’s parent grove.
Automated Metrics Collection
Section titled “Automated Metrics Collection”When harness events occur (via hooks), sciontool automatically records the following metrics:
| Metric | Type | Unit | Description |
|---|---|---|---|
gen_ai.tokens.input | Counter | tokens | Number of input tokens processed |
gen_ai.tokens.output | Counter | tokens | Number of output tokens generated |
gen_ai.tokens.cached | Counter | tokens | Number of tokens retrieved from cache |
agent.tool.calls | Counter | calls | Total number of tool executions |
agent.tool.duration | Histogram | ms | Latency of tool executions |
agent.session.count | Counter | sessions | Total number of agent sessions |
gen_ai.api.calls | Counter | calls | Total number of LLM API requests |
gen_ai.api.duration | Histogram | ms | Latency of LLM API requests |
(Note: The Codex harness has been expanded to capture comprehensive telemetry including tool usage, detailed tool input/output, and granular token counts for input, output, and cached tokens).
Correlated Logs
Section titled “Correlated Logs”For every significant lifecycle event (session start/end, tool use, model call), sciontool emits an OTel log record that is automatically correlated with the active trace. This means when viewing a trace waterfall in your observability backend (like Google Cloud Trace), you can click directly through to the specific logs associated with each span.
Hub Infrastructure Metrics
Section titled “Hub Infrastructure Metrics”The Scion Hub maintains internal operational metrics for infrastructure monitoring. These are available via the /api/v1/admin/metrics endpoint (requires hub:admin role) and can be exported to standard monitoring tools.
GCP Token Metrics
Section titled “GCP Token Metrics”With the introduction of GCP Identity emulation, the Hub tracks the health and performance of the token brokering pipeline:
| Metric | Description |
|---|---|
accessTokenRequests | Total number of GCP Access Token requests from agents. |
accessTokenSuccesses | Number of successfully brokered access tokens. |
accessTokenFailures | Number of failed access token requests (e.g., IAM permission errors). |
idTokenRequests | Total number of GCP Identity Token requests. |
rateLimitRejections | Number of token requests rejected due to per-agent rate limiting. |
iamLatencyP50Ms | Median latency of IAM API calls to Google Cloud. |
iamLatencyP95Ms | 95th percentile latency of IAM API calls. |
Broker Authentication Metrics
Section titled “Broker Authentication Metrics”Monitors the security and connectivity of Runtime Brokers:
authAttempts: Total broker authentication attempts.connectedBrokers: Current number of active Runtime Brokers connected to the Hub.dispatchFailures: Number of failed agent dispatch commands to brokers.
Privacy Filtering
Section titled “Privacy Filtering”By default, sciontool excludes agent.user.prompt events to protect user privacy. Filtering is configured via the telemetry.filter block in settings.yaml or scion-agent.yaml, or via environment variables.
Via Settings File
Section titled “Via Settings File”telemetry: filter: events: exclude: - "agent.user.prompt" - "agent.tool.result" attributes: redact: - "prompt" - "user.email" - "tool_output" hash: - "session_id" sampling: default: 1.0 rates: "agent.tool.call": 0.5Via Environment Variables
Section titled “Via Environment Variables”# Exclude multiple event typesexport SCION_TELEMETRY_FILTER_EXCLUDE="agent.user.prompt,agent.tool.result"
# Only forward specific event typesexport SCION_TELEMETRY_FILTER_INCLUDE="agent.session.start,agent.session.end,agent.tool.call"Attribute Redaction
Section titled “Attribute Redaction”Beyond event filtering, sciontool provides field-level attribute redaction for sensitive data. This allows telemetry to flow while protecting specific values.
Redacted Fields
Section titled “Redacted Fields”Redacted fields have their values replaced with [REDACTED]:
# Default redacted fieldsexport SCION_TELEMETRY_REDACT="prompt,user.email,tool_output,tool_input"Hashed Fields
Section titled “Hashed Fields”Hashed fields are replaced with their SHA256 hash, allowing correlation without exposing the original value:
# Default hashed fieldsexport SCION_TELEMETRY_HASH="session_id"Hook-to-Span Conversion
Section titled “Hook-to-Span Conversion”Harness hook events are automatically converted to OTLP spans:
| Hook Event | Span Name | Attributes |
|---|---|---|
session-start | agent.session.start | session_id, source |
session-end | agent.session.end | session_id, reason, tokens_*, duration_ms |
tool-start | agent.tool.call | tool_name, tool_input |
tool-end | agent.tool.result | tool_name, success, duration_ms |
prompt-submit | agent.user.prompt | prompt |
model-start | gen_ai.api.request | model |
model-end | gen_ai.api.response | success |
Session Metrics (Gemini)
Section titled “Session Metrics (Gemini)”For Gemini CLI agents, session-end events include aggregated metrics from the session file:
- Token counts:
tokens_input,tokens_output,tokens_cached - Session info:
turn_count,duration_ms,model - Per-tool statistics:
tool.<name>.calls,tool.<name>.success,tool.<name>.errors
Session files are automatically parsed from ~/.gemini/sessions/.
Implementation Details
Section titled “Implementation Details”The telemetry pipeline is implemented in pkg/sciontool/telemetry/:
config.go- Configuration loading from environment variablesfilter.go- Event type filtering (include/exclude) and attribute redactionexporter.go- Cloud OTLP exporter (gRPC and HTTP)receiver.go- OTLP gRPC/HTTP receiverpipeline.go- Main orchestration (Start/Stop lifecycle)
Hook-to-span conversion is in pkg/sciontool/hooks/handlers/:
telemetry.go- TelemetryHandler for converting hooks to spans- Session parsing in
pkg/sciontool/hooks/session/parser.go
The pipeline is integrated into the init command (cmd/sciontool/commands/init.go) and starts after user setup, before lifecycle hooks.