SimulationEvals¶
SimulationEvals runs AI-driven end-to-end conversation simulations against your CXAS agent. Instead of scripting exact utterances, you describe goals and success criteria — and a Gemini model figures out what to say at each turn to try to achieve them. This is a great way to test how your agent handles realistic, messy, unpredictable conversations.
Here are the key concepts:
Step(Pydantic model) — a single goal within a simulation, with agoal,success_criteria, optionalresponse_guide, and amax_turnslimit. Steps can also include astatic_utterancefor when you want a fixed first message, andinject_variablesfor seeding session state.StepStatusenum — tracks whether each step isNOT_STARTED,IN_PROGRESS, orCOMPLETED.simulate_conversation()— drives the full multi-turn loop, returning anLLMUserConversationobject that contains the transcript, step progress, and expectation results.generate_report()— produces aSimulationReportwith two DataFrames: goal progress and expectation results. It renders as styled HTML in a Jupyter notebook.
Quick Example¶
from cxas_scrapi import SimulationEvals
from cxas_scrapi.utils.rate_limiter import RateLimiter
app_name = "projects/my-project/locations/us/apps/my-app-id"
# Optional: configure a rate limiter to pace simulation turns and prevent quota exhaustion
limiter = RateLimiter(requests_per_minute=30.0)
sim = SimulationEvals(app_name=app_name, rate_limiter=limiter)
test_case = {
"steps": [
{
"goal": "User wants to check their account balance",
"success_criteria": "Agent provides a numeric balance and account status",
"max_turns": 5,
},
{
"goal": "User asks to dispute a charge",
"success_criteria": "Agent acknowledges the dispute and provides a reference number",
"max_turns": 8,
},
],
"expectations": [
"The agent should never ask for the full credit card number",
"The agent should offer to escalate if it cannot resolve the dispute",
],
}
# Run the simulation
conversation = sim.simulate_conversation(
test_case=test_case,
console_logging=True,
)
# View the report
report = conversation.generate_report()
print(report) # Colorized in terminal, styled HTML in Jupyter
Reference¶
SimulationEvals ¶
Bases: Apps
Wrapper class to simulate entire multi-turn conversations with a CXAS Agent.
Source code in src/cxas_scrapi/evals/simulation_evals.py
simulate_conversation ¶
simulate_conversation(test_case, sim_user_model=_DEFAULT_GEMINI_MODEL, eval_model=_DEFAULT_GEMINI_MODEL, session_id=None, console_logging=True, modality='text', background_noise_file=None, burst_noise_files=None, use_tool_fakes=False)
Runs the simulated conversation loop.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
test_case | dict[str, Any] | The test case dictionary defining evaluation steps. | required |
sim_user_model | str | None | The Gemini model used for the simulated user. | _DEFAULT_GEMINI_MODEL |
eval_model | str | None | The Gemini model used for evaluating expectations. | _DEFAULT_GEMINI_MODEL |
console_logging | bool | Whether to print interaction transcript to the console. | True |
Source code in src/cxas_scrapi/evals/simulation_evals.py
484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 | |
run_simulations ¶
run_simulations(test_cases, runs=1, parallel=1, sim_user_model=_DEFAULT_GEMINI_MODEL, eval_model=_DEFAULT_GEMINI_MODEL, modality='text', verbose=False, background_noise_file=None, burst_noise_files=None, use_tool_fakes=False)
Runs multiple simulations, optionally in parallel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
test_cases | list[dict[str, Any]] | List of test case dictionaries. | required |
runs | int | Number of runs per test case. | 1 |
parallel | int | Number of parallel workers (capped at 25). | 1 |
sim_user_model | str | None | Gemini model to use for simulated user. | _DEFAULT_GEMINI_MODEL |
eval_model | str | None | Gemini model to use for evaluating expectations. | _DEFAULT_GEMINI_MODEL |
modality | str | 'text' or 'audio'. | 'text' |
verbose | bool | Whether to log to console (only active if parallel=1). | False |
use_tool_fakes | bool | Use fake tools for the session if available. | False |
Source code in src/cxas_scrapi/evals/simulation_evals.py
export_results_to_golden ¶
Exports simulation results to a Golden Evaluation YAML file.
Fetches the full conversation trace for each simulation from the platform to ensure accuracy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results | list[dict[str, Any]] | The list of results returned by run_simulations. | required |
output_path | str | None | Optional local path to save the generated YAML. | None |
Returns:
| Type | Description |
|---|---|
str | The generated YAML string. |
Source code in src/cxas_scrapi/evals/simulation_evals.py
Step ¶
Bases: BaseModel
StepStatus ¶
Bases: str, Enum