Slot Filling Pattern¶

Collecting multiple structured inputs from a user in a conversational flow sounds simple. In practice, it's one of the hardest things to get right on CX Agent Studio. This page documents the Slot Filling Pattern — the approach developed from production experience with real agents — including the architecture, the three control surfaces that govern behavior, the critical callback preemption mechanism, and a full set of stabilization gotchas distilled from shipping.

The problem¶

CXAS has no native slot-filling primitive. The platform's XML <taskflow> in agent instructions relies on LLM state to track what has been collected and what to ask next. This creates four failure modes in production:

Failure mode	What happens
State fragility	The LLM "forgets" a slot value mid-conversation, especially in long sessions.
Premature task firing	The LLM decides to call a backend API before all required inputs are collected.
Progressive disclosure failure	The LLM previews future steps ("After you give me your date, I'll also need your credit card") instead of asking one question at a time.
Validation bypass	The LLM accepts obviously invalid input ("party of 50") without calling the validation tool, generating its own error message instead.

All four failures share a root cause: the LLM is being asked to manage state, control flow, and validation simultaneously. It's good at none of those things.

The solution¶

Split the problem across two components:

Python (the callback) owns state, control flow, task firing, validation, and retry logic.
The LLM owns language: parsing user intent, calling the right setter tool, and generating warm responses.

The LLM never decides "should I call the booking API now?" or "have I collected enough information?" — it calls setter tools as the user provides information, and follows _system_message to know what to ask next. Python handles everything else.

This is the Slot Filling Pattern (sometimes called the slot filling DAG framework).

Architecture¶

Slot Filling Data Flow — Data flows from the user through the LLM to setter tools, which write to `context.state`. The before_model_callback evaluates the DAG and either preempts the LLM or sets `_system_message` to guide the response.

User ──► LLM ──► Setter Tool ──► context.state['sm']
                                        │
                                        ▼
                              before_model_callback
                                  (DAG evaluation)
                                        │
                          ┌─────────────┴──────────────┐
                          │                            │
                    Inputs ready?              Not yet ready
                          │                            │
                          ▼                            ▼
                   Fire task →               Set _system_message
                   Preempt LLM               (next question)
                          │                            │
                          ▼                            ▼
                   Auto-fire result          LLM relays naturally
                   (zero extra turns)

The `sm` state variable¶

All slot filling state lives in a single session-scoped dict named sm. You declare it as a STRING type variable in your app's variableDeclarations.

{
  "filled":         {},
  "pending":        {},
  "task_results":   {},
  "_retries":       {},
  "_slot_errors":   [],
  "_system_message": "",
  "status":         "in_progress"
}

Key	Purpose
`filled`	Confirmed slot values: `{"party_size": 4, "preferred_date": "2026-06-17"}`
`pending`	Values awaiting user confirmation (readback) before moving to `filled`
`task_results`	Successful task outputs: `{"FindAvailableTimes": {"times": "6 PM, 7:30 PM"}}`
`_retries`	Failure counts — task retries (`"CheckAvailability": 1`) and slot validation retries (`"slot:party_size": 2`)
`_slot_errors`	Validation errors from setter tools: `[{"slot": "party_size", "code": "out_of_range"}]`
`_system_message`	The next message for the LLM to relay — written by the callback, read via `{{system_message}}` in the instruction
`status`	`"in_progress"` \| `"complete"` \| `"escalated"`

The sm variable is the source of truth

The LLM does not track what has been collected. If you need to know whether party_size has been collected, check sm['filled']['party_size'] in Python — never ask the LLM.

The three control surfaces¶

The Slot Filling Pattern has three places where you configure behavior. Getting all three right is the key to a stable agent.

1. Agent instruction (most critical)¶

The instruction tells the LLM its role in the pattern: call setter tools for every piece of information the user provides, relay _system_message naturally, and never preview future steps.

The single most important detail is including a concrete multi-slot example in the batching rule. Abstract instructions alone ("call ALL setter tools") are insufficient — the LLM needs to see a specific example to generalize the batching behavior reliably.

<slot_filling_protocol>
You are operating in SLOT FILLING mode. Follow these rules strictly:

1. TOOL-DRIVEN CONVERSATION: After each user message, identify EVERY piece
   of information the user provided and call ALL corresponding setter tools
   in the SAME response. For example, if the user says "table for 2 on
   June 20th under the name Johnson", call set_party_size, set_preferred_date,
   AND set_guest_name — all in one turn.

2. PROGRESSIVE DISCLOSURE: Only ask ONE question at a time.
   Never preview future steps.

3. RELAY SYSTEM MESSAGES: When _system_message is set in the system
   directive, incorporate it naturally into your response. When it contains
   specific times, names, or confirmation numbers, you MUST include those
   exact values.

4. ALWAYS CALL TOOLS: Call the setter tool for every piece of information
   the user provides, even if the value seems out of range. The system
   validates all inputs and handles errors automatically.

5. NATURAL CONVERSATION: If the user asks questions unrelated to the flow,
   answer helpfully but return to the reservation.
</slot_filling_protocol>

<system_directive>
{{system_message}}
</system_directive>

2. Tool docstrings (keep short)¶

Docstrings should describe what the tool does and what format to prepare the argument in — nothing else.

Verbose docstrings break batching

This is the most counterintuitive failure mode. Verbose docstrings (with caveats like "only call after party size is confirmed" or "valid range: 1-8") cause the LLM to focus on each tool individually rather than batching multiple calls. In one production test, adding two extra sentences to a docstring dropped multi-slot batching from 3/3 to 0/3. Keep docstrings to 1-2 sentences: format + trigger phrase only.

Good — format conversion and trigger phrase:

def set_preferred_date(date: str) -> dict:
    """Record the reservation date in YYYY-MM-DD format.

    Convert natural language ('next Friday', 'June 20th') to YYYY-MM-DD.
    Call immediately when any date is mentioned.
    """

Bad — duplicates what the framework already enforces:

def set_preferred_date(date: str) -> dict:
    """Record the reservation date in YYYY-MM-DD format.

    Convert natural language to YYYY-MM-DD. Only call after party size
    has been confirmed. Date must be in the future. We are open 5–10 PM.
    """

The second version teaches the LLM things Python already validates, giving it reasons to skip the tool call and improvise its own error message instead.

3. `before_model_callback` (the DAG engine)¶

The callback is pure Python that runs before every LLM turn. It reads sm state, evaluates the DAG, and either:

Sets sm['_system_message'] with the next question, then returns OK (LLM generates the response).
Fires a task, stores the result, and preempts the LLM (returns the message directly, skipping model generation entirely).

def before_model_callback(callback_context, llm_request):
    sm = callback_context.state['sm']

    # Terminal state — let the LLM respond freely
    if sm.get('status') in ('complete', 'escalated'):
        return None

    # Evaluate DAG: check if any task inputs are now satisfied
    filled = sm.get('filled', {})

    if _task_inputs_ready(sm, 'FindAvailableTimes'):
        result = _execute_find_times(filled)
        sm['task_results']['FindAvailableTimes'] = result
        sm['filled']['available_times'] = result['times']

    if _task_inputs_ready(sm, 'BookReservation'):
        result = _execute_book(filled)
        sm['task_results']['BookReservation'] = result
        sm['filled']['confirmation_number'] = result['confirmation']
        sm['status'] = 'complete'

    # Set _system_message for the LLM
    next_q, _ = _next_question(sm)
    sm['_system_message'] = next_q

    # Preempt if a task just fired (more than the initial user message in context)
    if _task_just_fired(sm) and llm_request.contents and len(llm_request.contents) > 1:
        from google.cloud.aiplatform_v1beta1.types import content as gapic_content
        return LlmResponse.from_parts(
            parts=[Part.from_text(text=sm['_system_message'])]
        )

    return None

Setter tool template¶

Each user-collected slot gets one Python setter tool. The tool validates the input, stores the value to sm, and returns the next question via _system_message.

def set_party_size(size: int) -> dict:
    """Record the number of guests.

    Parse natural language: 'just me'=1, 'a couple'=2, 'four of us'=4.
    Call immediately when party size is mentioned.
    """
    sm = context.state['sm']  # (1)

    # Validate — signal errors via _slot_errors, not return messages
    if not isinstance(size, int):
        try:
            size = int(size)
        except (ValueError, TypeError):
            sm.setdefault('_slot_errors', []).append(
                {'slot': 'party_size', 'code': 'parse_error'}
            )
            return {'error': True}

    if not (1 <= size <= 8):
        sm.setdefault('_slot_errors', []).append(
            {'slot': 'party_size', 'code': 'out_of_range'}
        )
        return {'error': True}

    # Store to pending (framework promotes to filled after readback)
    sm.setdefault('pending', {})['party_size'] = size  # (2)

    return {'stored': True, 'value': size}

context is a CXAS built-in — it's available in all Python tools without import.
Write to pending, not filled. The callback moves values to filled after the user confirms the readback. Slots without readback are auto-promoted immediately.

Key design principles for setters:

Setters are thin. No DAG logic, no control flow, no knowledge of other slots.
Setters signal errors, not messages. Append to _slot_errors with a code; the callback resolves the user-facing message from your config.
The LLM does the parsing. The setter for preferred_date expects YYYY-MM-DD — the LLM converts "next Thursday" to "2026-07-17" before calling the tool. This plays to the LLM's strength while keeping the setter deterministic.

The `_next_question()` helper¶

This function walks the ordered slot list and returns the first unfilled slot whose dependencies are satisfied. It's the core of the progressive disclosure mechanism — the LLM only ever sees the current question, never future ones.

def _next_question(sm: dict) -> tuple[str, str | None]:
    filled = sm.get('filled', {})

    order = [
        ('party_size',    'How many guests will be joining you?'),
        ('preferred_date', 'What date were you thinking?'),
    ]

    # available_times is a task output slot — only appears in the order
    # after the FindAvailableTimes task has populated it
    if 'available_times' in filled:
        times = filled['available_times']
        order.append(('selected_time', f'We have {times}. Which time works for you?'))

    order += [
        ('guest_name',       'What name should I put the reservation under?'),
        ('special_requests', 'Any special requests, or shall I note none?'),
    ]

    for slot_name, question in order:
        if slot_name not in filled:
            return question, slot_name

    return 'All information collected!', None

Task output slots in the order list

Notice that available_times (populated by the FindAvailableTimes task) and selected_time (collected after times are known) are added conditionally — only after the task has run. This is the DAG dependency in action: selected_time cannot be asked until available_times is filled.

DAG config concept¶

For more complex agents, you can declare the slot/task dependency graph explicitly rather than encoding it procedurally in _next_question. The declarative config makes the DAG visible as data — easier to validate, modify, and test.

config = {
    'slots': [
        {'name': 'party_size',    'source': 'user',
         'setter': 'set_party_size',
         'ask': 'How many guests will be joining you?',
         'requires_readback': True,
         'validation': {
             'errors': {
                 'out_of_range': 'We accept parties of 1 to 8 guests.',
                 'parse_error':  "I didn't catch the number of guests.",
             },
             'max_retries': 3,
         }},
        {'name': 'preferred_date', 'source': 'user',
         'setter': 'set_preferred_date',
         'ask': 'What date were you thinking?',
         'requires_readback': True},
        {'name': 'available_times', 'source': 'task:FindAvailableTimes'},
        {'name': 'selected_time',   'source': 'user',
         'setter': 'set_selected_time',
         'requires': ['available_times'],
         'ask': 'We have {available_times}. Which time works for you?',
         'requires_readback': True},
        {'name': 'guest_name',      'source': 'user',
         'setter': 'set_guest_name',
         'ask': 'What name should I put the reservation under?',
         'requires_readback': True},
        {'name': 'special_requests', 'source': 'user',
         'setter': 'set_special_requests',
         'ask': 'Any special requests, or shall I note none?',
         'requires_readback': True},
    ],
    'tasks': [
        {'name': 'FindAvailableTimes',
         'inputs': ['party_size', 'preferred_date'],
         'outputs': {'times': 'available_times'},
         'success_check': 'success',
         'then_say': 'We have {available_times}. Which time works for you?'},
        {'name': 'BookReservation',
         'inputs': ['party_size', 'preferred_date', 'selected_time',
                    'guest_name', 'special_requests'],
         'outputs': {'confirmation': 'confirmation_number'},
         'success_check': 'success',
         'terminal': True,
         'then_say': "You're confirmed! Your number is {confirmation_number}."},
    ],
}

The requires field on selected_time enforces the dependency at two levels:

The question is skipped in _next_question until available_times is filled.
The setter tool is hidden from the LLM's tool list until the dependency is met — so the LLM structurally cannot call set_selected_time before availability is known.

The callback preemption mechanism¶

Callback Preemption — the most critical architectural detail

When the callback fires a task and preempts the LLM, the LLM does not get a turn. Any setter tools the LLM had queued in the current response but not yet called are permanently lost. This is why the agent instruction must require all setter tools to be called in the same response — deferring any setter to a later turn creates a race condition with the callback.

Preemption uses LlmResponse.from_parts() to bypass LLM generation entirely:

if task_fired and llm_request.contents and len(llm_request.contents) > 1:
    return LlmResponse.from_parts(
        parts=[Part.from_text(text=sm['_system_message'])]
    )

The len(llm_request.contents) > 1 guard prevents preempting on the first user message — the LLM should always generate a natural greeting rather than a canned "How many guests?" on the opening turn.

The full flow:

User message
    → LLM calls setter tool(s) in one response
    → before_model_callback fires
        → Check DAG: are all task inputs now satisfied?
            YES → fire task, store result in sm
                → If multi-content (not first turn): PREEMPT
                    → LlmResponse returned directly — LLM skipped
                    → Task result delivered verbatim
            NO → compute next question
                → Set sm['_system_message']
                → Return OK → LLM generates response using _system_message

Five preemption triggers (not just task firing):

Task fires — task result message delivered verbatim.
Slot validation error — error message from config delivered verbatim.
Readback transition — after confirm_pending, the next question is delivered with a warm prefix.
Readback stall exhaustion — stall detector has rejected pending values too many times.
Progress stall — no forward progress for N turns.

Step-by-step: adding slot filling to your agent¶

Step 1 — Define your slotsStep 2 — Add the sm variableStep 3 — Write setter toolsStep 4 — Write _next_question()Step 5 — Write before_model_callbackStep 6 — Write the agent instructionStep 7 — Write evaluations

List every piece of information you need to collect, in the natural order to ask for it, and every backend task that fires from those inputs.

For a restaurant reservation: - party_size (user) → enables FindAvailableTimes - preferred_date (user) → enables FindAvailableTimes - available_times (task output from FindAvailableTimes) - selected_time (user, requires available_times) - guest_name (user) - special_requests (user) → enables BookReservation

In your CX Agent Studio app's variable declarations, add an sm variable of type STRING with this initial value:

{"filled": {}, "pending": {}, "task_results": {}, "_retries": {},
 "_slot_errors": [], "_system_message": "", "status": "in_progress"}

One Python tool per user-collected slot. Follow the template: validate → write to sm['pending'] → return {"stored": True}. For invalid input, append to _slot_errors and return {"error": True}.

Keep docstrings to 2 sentences: format + trigger phrase.

An ordered list of (slot_name, question) pairs. Task output slots (like available_times) are added conditionally after their task has run. Return the first slot not in sm['filled'].

Check terminal state first. Evaluate the DAG (check whether each task's inputs are all in sm['filled']). Fire any ready tasks. Set sm['_system_message']. Preempt if a task fired and this is not the first turn.

Include the <slot_filling_protocol> block. Include a concrete multi-slot example in the batching rule. End with a <system_directive>{{system_message}}</system_directive> block so the LLM can see _system_message.

Cover: single slot per turn, multi-slot batching, error recovery, out-of-range input, natural language parsing, conversational detours, and the full happy path end-to-end.

Slot Filling vs. XML Taskflow¶

Aspect	XML Taskflow	Slot Filling
State tracking	LLM memory — degrades over long conversations	`context.state` — deterministic, persists across turns
Task firing	LLM decides when (unreliable, premature or late)	Auto-fires when inputs ready — exact, every time
Progressive disclosure	Prompt-based — LLM sees all stages and can preview	Tool-driven — LLM only sees the current question
Validation	LLM-based — hallucinates error messages	Python code — exact validation, config-driven messages
Retry logic	LLM improvises — counts reset each turn	`_retries` dict — persists across turns, bounded by `max_retries`
Dependency enforcement	Prompt-based (fragile)	Tool visibility — LLM cannot call a hidden setter
Testability	Hard — non-deterministic LLM state	Easy — pure Python DAG functions, testable with dicts

Writing evaluations for slot-filling agents¶

Evaluation Workflow — Golden evals test deterministic turn-by-turn behavior; scenario evals test open-ended goals. Both feed into the scoring pipeline.

A healthy eval suite for a slot-filling agent covers five categories:

Category	What to test
Happy path	Single slot per turn; multi-slot batching (2+ slots in one message); the full end-to-end flow
Error recovery	Invalid input (out of range, wrong format); the agent re-asks after error
Guard rails	Out-of-scope requests; off-topic detours that return to the flow
NLP parsing	Natural language dates ("next Friday"), quantities ("a couple"), times ("seven thirty PM")
Conversational detours	User asks a question mid-flow; agent answers and returns to collection

Golden eval structure for multi-slot batching:

{
  "golden": {
    "turns": [
      {
        "steps": [
          {"userInput": {"text": "Table for 2 on June 20th under Johnson"}},
          {"expectation": {"toolCall": {"tool": "set_party_size",
                                        "args": {"size": 2.0}}}},
          {"expectation": {"toolCall": {"tool": "set_preferred_date",
                                        "args": {"date": "2026-06-20"}}}},
          {"expectation": {"toolCall": {"tool": "set_guest_name",
                                        "args": {"name": "Johnson"}}}},
          {"expectation": {"agentResponse": {"chunks": [{"text": "Johnson"}]}}}
        ]
      }
    ]
  }
}

agentResponse is required in every turn

Every expectations block must include an agentResponse check, even if it only verifies one word. An evaluation step with only a toolCall expectation is flagged as INVALID by the platform, and INVALID = FAIL. This is the most common cause of unexpected evaluation failures.

Scenario eval for end-to-end flow:

{
  "scenario": {
    "task": "Make a dinner reservation for 4 people on June 17th. When asked for a time, choose 7 PM. Name is Garcia. No special requests.",
    "rubrics": [
      "The agent must collect party size, date, time, name, and special requests.",
      "The agent must be warm and inviting."
    ],
    "maxTurns": 20
  }
}

The 7 stabilization gotchas¶

These are the failure modes encountered shipping slot-filling agents to production, distilled into concrete, actionable items.

1. Missing agentResponse in evaluation expectations

Every turn in a golden eval must have an agentResponse expectation in its steps block. If a step only checks toolCall with no agentResponse, the platform marks it INVALID, which counts as a failure.

Fix: Add even a minimal agentResponse check — a single word that should appear in the response — to every turn.

2. Verbose docstrings break multi-slot batching

Adding validation details ("valid range: 1-8"), prerequisite caveats ("only call after date is confirmed"), or business rules ("we're closed on Mondays") to tool docstrings regresses batching. The LLM reads each tool's docstring separately, which fragments its attention away from the batching rule in the instruction.

Fix: Keep docstrings to 2 sentences. Put all business rules in the agent instruction or enforce them in Python code.

3. Missing concrete example in the batching instruction

The instruction rule "call ALL setter tools in the SAME response" is abstract. The LLM needs a specific, realistic example to generalize the behavior.

Fix: Add a concrete example like: "if the user says 'table for 2 on June 20^th under the name Johnson', call set_party_size, set_preferred_date, AND set_guest_name — all in one turn."

4. Callback preemption races with deferred tool calls

If the LLM calls setter A in one tool call but defers setter B to the next response, and the callback fires between them (e.g., because setter A completed all of task X's inputs), setter B is never called. The preempted response overwrites whatever the LLM was about to say.

Fix: The agent instruction must require all setters to be called in one response. The concrete example in gotcha 3 helps drive this behavior.

5. Task output slots missing from _next_question order

If available_times (populated by a task) is not in the _next_question order, selected_time will never appear as the next question — the function will fall through to guest_name instead.

Fix: Add task output slots to the order list conditionally: check that the slot is in sm['filled'] before including the dependent slot's question in the list.

6. _system_message missing from some code paths

Every code path through the callback must set sm['_system_message']. If an error branch or early-return path forgets to set it, the LLM will see a stale message from the previous turn and relay incorrect guidance.

Fix: Set sm['_system_message'] at the end of the main path, and explicitly sm.pop('_system_message', None) in any path that should clear it.

7. Terminal task status not checked before firing tasks

After BookReservation fires and sets sm['status'] = 'complete', the callback must check this status before re-evaluating the DAG. Without the check, the callback may attempt to re-fire BookReservation on every subsequent turn (all inputs are still in filled, and the task's inputs-ready check would pass again).

Fix: Add a terminal state guard at the top of the callback:

if sm.get('status') in ('complete', 'escalated'):
    return None

Mapping from Sonic YAML¶

If you're migrating a Sonic-style agent definition to the Slot Filling Pattern:

Sonic YAML	Slot Filling equivalent
`type: Slot, source: user`	Setter tool + entry in `_next_question` order
`type: Slot, source: task:X`	Task output written to `sm['filled']` in the callback
`type: Slot, requires: [Y]`	Conditional entry in `_next_question` (check `Y` in `filled`)
Task with `inputs: {a: $slot_a}`	DAG check in callback: `if 'slot_a' in sm['filled']`
Task `terminal: true`	`sm['status'] = 'complete'` after task fires
Slot `validation:` block	`_slot_errors` signal + error code lookup in callback

Reference implementation¶

The Bella Notte restaurant reservation agent is the canonical reference implementation of this pattern. It implements:

7 slots: party_size, preferred_date, available_times, selected_time, guest_name, special_requests, confirmation_number
2 tasks: FindAvailableTimes (fires after party_size + preferred_date), BookReservation (terminal, fires after all required slots)
5 setter tools: set_party_size, set_preferred_date, set_guest_name, set_selected_time, set_special_requests
20+ golden evals, 5+ scenario evals

Build it yourself in the Restaurant Reservation Tutorial →