Turn Evals¶

Turn Evals are the most lightweight evaluation type in SCRAPI. They send a single message (or a short scripted sequence of messages) to your agent and assert on specific properties of the response — whether the agent called a particular tool, what it said, or whether it transferred control to a sub-agent.

Use Turn Evals when you need quick, targeted assertions on specific agent behaviors without the overhead of scripting a full conversation.

When to use Turn Evals¶

Turn Evals are a good fit when:

You want to verify that a specific input always triggers a specific tool call
You're checking that the agent doesn't call tools in certain situations
You want to verify agent transfer happens for the right inputs
You need a fast check during development without waiting for platform goldens to run

They're less appropriate for multi-turn conversations or for checking the quality of natural language responses — use Local Simulations for those.

The `TurnEvals` class¶

from cxas_scrapi.evals.turn_evals import TurnEvals

turn_evals = TurnEvals(
    app_name="projects/my-project/locations/us/apps/my-app",
)

The `TurnOperator` enum¶

TurnOperator defines the assertion types available for each turn:

Operator	Description
`CONTAINS`	Agent response contains the expected string
`EQUALS`	Agent response exactly equals the expected string
`TOOL_CALLED`	The specified tool was called during this turn
`TOOL_INPUT`	The specified tool was called with the expected input argument
`TOOL_OUTPUT`	The specified tool returned the expected output
`NO_TOOLS_CALLED`	No tools were called during this turn
`AGENT_TRANSFER`	The agent transferred to the specified sub-agent

from cxas_scrapi.evals.turn_evals import TurnOperator

# Available values
TurnOperator.CONTAINS
TurnOperator.EQUALS
TurnOperator.TOOL_CALLED
TurnOperator.TOOL_INPUT
TurnOperator.TOOL_OUTPUT
TurnOperator.NO_TOOLS_CALLED
TurnOperator.AGENT_TRANSFER

Writing turn eval tests¶

Defining tests in YAML¶

Turn eval tests are defined in YAML files. Each test specifies user input and expectations:

conversations:
  - conversation: order_lookup_triggers_tool
    user: "What's the status of order ORD-12345?"
    variables:
      order_12345_status: "shipped"
    expectations:
      - type: tool_called
        value: "lookup_order"

  - conversation: welcome_message_check
    event: "welcome"
    expectations:
      - type: contains
        value: "Welcome"
      - type: no_tools_called

  - conversation: tool_receives_correct_order_id
    user: "Check order ORD-12345 please"
    expectations:
      - type: tool_input
        value:
          order_id: "ORD-12345"

  - conversation: billing_transfers_to_billing_agent
    user: "I want to dispute a charge on my bill"
    expectations:
      - type: agent_transfer
        value: "billing-agent"

Multi-turn test cases¶

Turn Evals support short scripted sequences using the turns field:

conversations:
  - conversation: order_id_collection_flow
    turns:
      - turn: ask_about_order
        user: "I want to check my order"
        expectations:
          - type: no_tools_called
          - type: contains
            value: "order ID"
      - turn: provide_order_id
        user: "It's ORD-12345"
        expectations:
          - type: tool_called
            value: "lookup_order"

Running tests programmatically¶

from cxas_scrapi.evals.turn_evals import TurnEvals

turn_evals = TurnEvals(app_name="projects/my-project/locations/us/apps/my-app")

# Load test cases from YAML
test_cases = turn_evals.load_turn_test_cases_from_file("evals/turn_evals/core_assertions.yaml")

# Run all tests — returns a pandas DataFrame
results_df = turn_evals.run_turn_tests(test_cases)

Running from a YAML file¶

You can also define turn eval test cases in YAML:

test_cases:
  - name: "order_lookup_triggers_tool"
    session_parameters:
      order_12345_status: "shipped"
    turns:
      - turn: user
        user: "What's the status of order ORD-12345?"
        expectations:
          - type: tool_called
            value: "lookup_order"

  - name: "welcome_has_no_tools"
    turns:
      - turn: event
        event: welcome
        expectations:
          - type: no_tools_called

You can also load from a directory to run all YAML files at once:

test_cases = turn_evals.load_turn_tests_from_dir("evals/turn_evals/")
results_df = turn_evals.run_turn_tests(test_cases)

Interpreting results¶

run_turn_tests returns a DataFrame with one row per expectation:

Column	Description
`test_name`	Name of the test case
`turn`	Which turn in a multi-turn test
`user`	The user input that was sent
`status`	`SUCCESS` or `FAILURE`
`errors`	Error details if the assertion failed
`expected`	What was expected
`actual`	What was observed

results_df = turn_evals.run_turn_tests(test_cases)

failed = results_df[results_df["status"] == "FAILURE"]
if not failed.empty:
    print("Failed assertions:")
    for _, row in failed.iterrows():
        print(f"  {row['test_name']}: {row['errors']}")

Integration with the skills system¶

The Run skill includes Turn Evals as part of its combined reporting. When you run all four eval types, Turn Eval results appear in the combined report alongside tool tests, goldens, and simulations, making it easy to see the full picture at a glance.