CallbackEvals¶

CallbackEvals brings proper unit testing to your CXAS agent callbacks. It runs pytest-based tests against your callback Python code, either by reading it from a local app directory or by fetching it live from the CXAS API.

The two main methods are:

test_all_callbacks_in_app_dir() — scans your local app directory, discovers every test.py alongside each callback's python_code.py, and runs them all. Returns a DataFrame with results for each test.
test_single_callback_for_agent() — fetches a specific agent's callback from the CXAS API and runs the test file you point it to. Useful for CI jobs where you want to verify a specific agent hasn't regressed.

Both methods return a pandas DataFrame with columns: agent_name, callback_type, test_name, status, and error_message.

Quick Example¶

from cxas_scrapi import CallbackEvals

ce = CallbackEvals()

# Run all callback tests discovered in your local app directory
results_df = ce.test_all_callbacks_in_app_dir(
    app_dir="./cxas_app/My_Agent_App",
)
print(results_df)

# Run tests for a specific agent and callback type (fetched from the API)
results_df = ce.test_single_callback_for_agent(
    app_name="projects/my-project/locations/us/apps/my-app-id",
    agent_name="root_agent",
    callback_type="before_model_callback",
    test_file_path="evals/callback_tests/tests/root_agent/before_model_callbacks/before_model/test.py",
)
print(results_df[["test_name", "status", "error_message"]])

Your test.py files look just like standard pytest:

import python_code  # auto-injected by CallbackEvals

def test_callback_returns_correct_format():
    result = python_code.before_model_callback(handler=None, request={"text": "hi"})
    assert result is not None

Reference¶

CallbackEvals ¶

Provides methods for orchestrating and executing agent callback tests.

test_single_callback_for_agent ¶

test_single_callback_for_agent(app_name, agent_name, callback_type, test_file_path, log_file=None, pytest_args=None)

Runs test against a single callback fetched from the agent proto.

Parameters:

Name	Type	Description	Default
`app_name`	`str`	The CXAS App name.	required
`agent_name`	`str`	The name or display name of the agent.	required
`callback_type`	`str`	The type of callback (e.g., 'before_model', 'after_tool').	required
`test_file_path`	`str`	Path to the test.py file to run.	required
`log_file`	`str`	Optional. Path to a file to log pytest output to.	`None`
`pytest_args`	`list[str]`	Optional. Additional arguments to pass to pytest.	`None`

Source code in src/cxas_scrapi/evals/callback_evals.py

def test_single_callback_for_agent(
    self,
    app_name: str,
    agent_name: str,
    callback_type: str,
    test_file_path: str,
    log_file: str = None,
    pytest_args: list[str] = None,
) -> pd.DataFrame:
    """Runs test against a single callback fetched from the agent proto.

    Args:
        app_name: The CXAS App name.
        agent_name: The name or display name of the agent.
        callback_type: The type of callback (e.g., 'before_model',
            'after_tool').
        test_file_path: Path to the test.py file to run.
        log_file: Optional. Path to a file to log pytest output to.
        pytest_args: Optional. Additional arguments to pass to pytest.
    """

    agents_client = Agents(app_name=app_name)

    # Get agent ID
    try:
        agents_map = agents_client.get_agents_map(reverse=True)
        agent = agents_client.get_agent(
            agents_map.get(agent_name).split("/")[-1]
        )
    except Exception as e:
        logger.error(f"Failed to get agent {agent_name}: {e}")
        raise ValueError(
            f"Failed to get agent {agent_name} from application to "
            f"run callback test."
        ) from e

    # Fetch callback
    all_callbacks = {
        "before_model_callback": agent.before_model_callbacks,
        "after_model_callback": agent.after_model_callbacks,
        "before_tool_callback": agent.before_tool_callbacks,
        "after_tool_callback": agent.after_tool_callbacks,
        "before_agent_callbacks": agent.before_agent_callbacks,
        "after_agent_callbacks": agent.after_agent_callbacks,
    }
    callback = all_callbacks.get(callback_type)
    if not callback:
        raise ValueError(
            f"No callback found of type {callback_type} for agent "
            f"{agent_name}"
        )
    if len(callback) > 1:
        raise ValueError(
            f"Multiple callbacks found of type {callback_type} for "
            f"agent {agent_name}"
        )

    code_content = callback[0].python_code

    if not os.path.exists(test_file_path):
        raise FileNotFoundError(f"Test file not found: {test_file_path}")

    with open(test_file_path, "r", encoding="utf-8") as f:
        test_content = f.read()

        results = self._run_test(
            code_content,
            test_content,
            test_file_path,
            agent_name,
            callback_type,
            log_file,
            pytest_args,
        )
        return pd.DataFrame(
            results,
            columns=[
                "agent_name",
                "callback_type",
                "test_name",
                "status",
                "error_message",
            ],
        )

test_all_callbacks_in_app_dir ¶

test_all_callbacks_in_app_dir(app_dir, agent_name='*', callback_type='*_callbacks', callback_name='*', log_file=None, pytest_args=None)

Runs pytest against all callback tests in the given agent directory.

Parameters:

Name	Type	Description	Default
`app_dir`	`str`	The path to the CES app root directory.	required
`agent_name`	`str`	Optional. The name of the agent to run tests for. If not provided, all agents will be tested.	`'*'`
`callback_type`	`str`	Optional. The type of callback to run tests for. If not provided, all callback types will be tested.	`'*_callbacks'`
`callback_name`	`str`	Optional. The name of the callback to run tests for. If not provided, all callbacks will be tested.	`'*'`
`log_file`	`str`	Optional. Path to a file to log pytest output to. If not provided, output will be logged to the console.	`None`
`pytest_args`	`list[str]`	Optional. Additional arguments to pass to pytest. Defaults to None.	`None`

Returns:

Type	Description
`DataFrame`	A pandas DataFrame containing test execution results.

Source code in src/cxas_scrapi/evals/callback_evals.py

def test_all_callbacks_in_app_dir(
    self,
    app_dir: str,
    agent_name: str = "*",
    callback_type: str = "*_callbacks",
    callback_name: str = "*",
    log_file: str = None,
    pytest_args: list[str] = None,
) -> pd.DataFrame:
    """Runs pytest against all callback tests in the given agent directory.

    Args:
        app_dir: The path to the CES app root directory.
        agent_name: Optional. The name of the agent to run tests for.
            If not provided, all agents will be tested.
        callback_type: Optional. The type of callback to run tests for.
            If not provided, all callback types will be tested.
        callback_name: Optional. The name of the callback to run tests for.
            If not provided, all callbacks will be tested.
        log_file: Optional. Path to a file to log pytest output to.
            If not provided, output will be logged to the console.
        pytest_args: Optional. Additional arguments to pass to pytest.
            Defaults to None.

    Returns:
        A pandas DataFrame containing test execution results.
    """

    agent_name = agent_name or "*"
    callback_type = callback_type or "*_callbacks"
    callback_name = callback_name or "*"

    # Discover all test.py files within the agent directory
    # Expected: agents/<agent_name>/<type>_callbacks/<callback_name>/test.py
    search_pattern = os.path.join(
        app_dir,
        "agents",
        agent_name,
        callback_type,
        callback_name,
        "test.py",
    )
    test_files = glob.glob(search_pattern, recursive=True)

    if not test_files:
        logger.warning(f"No callback tests found in {app_dir}")
        return pd.DataFrame(
            columns=[
                "agent_name",
                "callback_type",
                "test_name",
                "status",
                "error_message",
            ]
        )

    logger.info(f"Found {len(test_files)} callback tests.")

    all_results = []

    for test_file in test_files:
        test_dir = os.path.dirname(os.path.abspath(test_file))
        python_code_path = os.path.join(test_dir, "python_code.py")

        if not os.path.exists(python_code_path):
            logger.warning(
                f"Warning: {test_file} found, but no "
                "python_code.py exists alongside it. Skipping."
            )
            continue

        logger.debug(f"Running test for: {python_code_path}")

        with open(python_code_path, "r", encoding="utf-8") as f:
            code_content = f.read()

        with open(test_file, "r", encoding="utf-8") as f:
            test_content = f.read()

        cur_agent_name = self._get_agent_name(test_file)
        cur_callback_type = self._get_callback_type(test_file)
        all_results.extend(
            self._run_test(
                code_content,
                test_content,
                test_file,
                cur_agent_name,
                cur_callback_type,
                log_file,
                pytest_args,
            )
        )

    return pd.DataFrame(
        all_results,
        columns=[
            "agent_name",
            "callback_type",
            "test_name",
            "status",
            "error_message",
        ],
    )