Slot Filling Pattern¶
Collecting multiple structured inputs from a user in a conversational flow sounds simple. In practice, it's one of the hardest things to get right on CX Agent Studio. This page documents the Slot Filling Pattern — the approach developed from production experience with real agents — including the architecture, the three control surfaces that govern behavior, the critical callback preemption mechanism, and a full set of stabilization gotchas distilled from shipping.
The problem¶
CXAS has no native slot-filling primitive. The platform's XML <taskflow> in agent instructions relies on LLM state to track what has been collected and what to ask next. This creates four failure modes in production:
| Failure mode | What happens |
|---|---|
| State fragility | The LLM "forgets" a slot value mid-conversation, especially in long sessions. |
| Premature task firing | The LLM decides to call a backend API before all required inputs are collected. |
| Progressive disclosure failure | The LLM previews future steps ("After you give me your date, I'll also need your credit card") instead of asking one question at a time. |
| Validation bypass | The LLM accepts obviously invalid input ("party of 50") without calling the validation tool, generating its own error message instead. |
All four failures share a root cause: the LLM is being asked to manage state, control flow, and validation simultaneously. It's good at none of those things.
The solution¶
Split the problem across two components:
- Python (the callback) owns state, control flow, task firing, validation, and retry logic.
- The LLM owns language: parsing user intent, calling the right setter tool, and generating warm responses.
The LLM never decides "should I call the booking API now?" or "have I collected enough information?" — it calls setter tools as the user provides information, and follows _system_message to know what to ask next. Python handles everything else.
This is the Slot Filling Pattern (sometimes called the slot filling DAG framework).
Architecture¶
context.state. The before_model_callback evaluates the DAG and either preempts the LLM or sets _system_message to guide the response.User ──► LLM ──► Setter Tool ──► context.state['sm']
│
▼
before_model_callback
(DAG evaluation)
│
┌─────────────┴──────────────┐
│ │
Inputs ready? Not yet ready
│ │
▼ ▼
Fire task → Set _system_message
Preempt LLM (next question)
│ │
▼ ▼
Auto-fire result LLM relays naturally
(zero extra turns)
The sm state variable¶
All slot filling state lives in a single session-scoped dict named sm. You declare it as a STRING type variable in your app's variableDeclarations.
{
"filled": {},
"pending": {},
"task_results": {},
"_retries": {},
"_slot_errors": [],
"_system_message": "",
"status": "in_progress"
}
| Key | Purpose |
|---|---|
filled | Confirmed slot values: {"party_size": 4, "preferred_date": "2026-06-17"} |
pending | Values awaiting user confirmation (readback) before moving to filled |
task_results | Successful task outputs: {"FindAvailableTimes": {"times": "6 PM, 7:30 PM"}} |
_retries | Failure counts — task retries ("CheckAvailability": 1) and slot validation retries ("slot:party_size": 2) |
_slot_errors | Validation errors from setter tools: [{"slot": "party_size", "code": "out_of_range"}] |
_system_message | The next message for the LLM to relay — written by the callback, read via {{system_message}} in the instruction |
status | "in_progress" | "complete" | "escalated" |
The sm variable is the source of truth
The LLM does not track what has been collected. If you need to know whether party_size has been collected, check sm['filled']['party_size'] in Python — never ask the LLM.
The three control surfaces¶
The Slot Filling Pattern has three places where you configure behavior. Getting all three right is the key to a stable agent.
1. Agent instruction (most critical)¶
The instruction tells the LLM its role in the pattern: call setter tools for every piece of information the user provides, relay _system_message naturally, and never preview future steps.
The single most important detail is including a concrete multi-slot example in the batching rule. Abstract instructions alone ("call ALL setter tools") are insufficient — the LLM needs to see a specific example to generalize the batching behavior reliably.
<slot_filling_protocol>
You are operating in SLOT FILLING mode. Follow these rules strictly:
1. TOOL-DRIVEN CONVERSATION: After each user message, identify EVERY piece
of information the user provided and call ALL corresponding setter tools
in the SAME response. For example, if the user says "table for 2 on
June 20th under the name Johnson", call set_party_size, set_preferred_date,
AND set_guest_name — all in one turn.
2. PROGRESSIVE DISCLOSURE: Only ask ONE question at a time.
Never preview future steps.
3. RELAY SYSTEM MESSAGES: When _system_message is set in the system
directive, incorporate it naturally into your response. When it contains
specific times, names, or confirmation numbers, you MUST include those
exact values.
4. ALWAYS CALL TOOLS: Call the setter tool for every piece of information
the user provides, even if the value seems out of range. The system
validates all inputs and handles errors automatically.
5. NATURAL CONVERSATION: If the user asks questions unrelated to the flow,
answer helpfully but return to the reservation.
</slot_filling_protocol>
<system_directive>
{{system_message}}
</system_directive>
2. Tool docstrings (keep short)¶
Docstrings should describe what the tool does and what format to prepare the argument in — nothing else.
Verbose docstrings break batching
This is the most counterintuitive failure mode. Verbose docstrings (with caveats like "only call after party size is confirmed" or "valid range: 1-8") cause the LLM to focus on each tool individually rather than batching multiple calls. In one production test, adding two extra sentences to a docstring dropped multi-slot batching from 3/3 to 0/3. Keep docstrings to 1-2 sentences: format + trigger phrase only.
Good — format conversion and trigger phrase:
def set_preferred_date(date: str) -> dict:
"""Record the reservation date in YYYY-MM-DD format.
Convert natural language ('next Friday', 'June 20th') to YYYY-MM-DD.
Call immediately when any date is mentioned.
"""
Bad — duplicates what the framework already enforces:
def set_preferred_date(date: str) -> dict:
"""Record the reservation date in YYYY-MM-DD format.
Convert natural language to YYYY-MM-DD. Only call after party size
has been confirmed. Date must be in the future. We are open 5–10 PM.
"""
The second version teaches the LLM things Python already validates, giving it reasons to skip the tool call and improvise its own error message instead.
3. before_model_callback (the DAG engine)¶
The callback is pure Python that runs before every LLM turn. It reads sm state, evaluates the DAG, and either:
- Sets
sm['_system_message']with the next question, then returnsOK(LLM generates the response). - Fires a task, stores the result, and preempts the LLM (returns the message directly, skipping model generation entirely).
def before_model_callback(callback_context, llm_request):
sm = callback_context.state['sm']
# Terminal state — let the LLM respond freely
if sm.get('status') in ('complete', 'escalated'):
return None
# Evaluate DAG: check if any task inputs are now satisfied
filled = sm.get('filled', {})
if _task_inputs_ready(sm, 'FindAvailableTimes'):
result = _execute_find_times(filled)
sm['task_results']['FindAvailableTimes'] = result
sm['filled']['available_times'] = result['times']
if _task_inputs_ready(sm, 'BookReservation'):
result = _execute_book(filled)
sm['task_results']['BookReservation'] = result
sm['filled']['confirmation_number'] = result['confirmation']
sm['status'] = 'complete'
# Set _system_message for the LLM
next_q, _ = _next_question(sm)
sm['_system_message'] = next_q
# Preempt if a task just fired (more than the initial user message in context)
if _task_just_fired(sm) and llm_request.contents and len(llm_request.contents) > 1:
from google.cloud.aiplatform_v1beta1.types import content as gapic_content
return LlmResponse.from_parts(
parts=[Part.from_text(text=sm['_system_message'])]
)
return None
Setter tool template¶
Each user-collected slot gets one Python setter tool. The tool validates the input, stores the value to sm, and returns the next question via _system_message.
def set_party_size(size: int) -> dict:
"""Record the number of guests.
Parse natural language: 'just me'=1, 'a couple'=2, 'four of us'=4.
Call immediately when party size is mentioned.
"""
sm = context.state['sm'] # (1)
# Validate — signal errors via _slot_errors, not return messages
if not isinstance(size, int):
try:
size = int(size)
except (ValueError, TypeError):
sm.setdefault('_slot_errors', []).append(
{'slot': 'party_size', 'code': 'parse_error'}
)
return {'error': True}
if not (1 <= size <= 8):
sm.setdefault('_slot_errors', []).append(
{'slot': 'party_size', 'code': 'out_of_range'}
)
return {'error': True}
# Store to pending (framework promotes to filled after readback)
sm.setdefault('pending', {})['party_size'] = size # (2)
return {'stored': True, 'value': size}
contextis a CXAS built-in — it's available in all Python tools without import.- Write to
pending, notfilled. The callback moves values tofilledafter the user confirms the readback. Slots without readback are auto-promoted immediately.
Key design principles for setters:
- Setters are thin. No DAG logic, no control flow, no knowledge of other slots.
- Setters signal errors, not messages. Append to
_slot_errorswith a code; the callback resolves the user-facing message from your config. - The LLM does the parsing. The setter for
preferred_dateexpectsYYYY-MM-DD— the LLM converts "next Thursday" to"2026-07-17"before calling the tool. This plays to the LLM's strength while keeping the setter deterministic.
The _next_question() helper¶
This function walks the ordered slot list and returns the first unfilled slot whose dependencies are satisfied. It's the core of the progressive disclosure mechanism — the LLM only ever sees the current question, never future ones.
def _next_question(sm: dict) -> tuple[str, str | None]:
filled = sm.get('filled', {})
order = [
('party_size', 'How many guests will be joining you?'),
('preferred_date', 'What date were you thinking?'),
]
# available_times is a task output slot — only appears in the order
# after the FindAvailableTimes task has populated it
if 'available_times' in filled:
times = filled['available_times']
order.append(('selected_time', f'We have {times}. Which time works for you?'))
order += [
('guest_name', 'What name should I put the reservation under?'),
('special_requests', 'Any special requests, or shall I note none?'),
]
for slot_name, question in order:
if slot_name not in filled:
return question, slot_name
return 'All information collected!', None
Task output slots in the order list
Notice that available_times (populated by the FindAvailableTimes task) and selected_time (collected after times are known) are added conditionally — only after the task has run. This is the DAG dependency in action: selected_time cannot be asked until available_times is filled.
DAG config concept¶
For more complex agents, you can declare the slot/task dependency graph explicitly rather than encoding it procedurally in _next_question. The declarative config makes the DAG visible as data — easier to validate, modify, and test.
config = {
'slots': [
{'name': 'party_size', 'source': 'user',
'setter': 'set_party_size',
'ask': 'How many guests will be joining you?',
'requires_readback': True,
'validation': {
'errors': {
'out_of_range': 'We accept parties of 1 to 8 guests.',
'parse_error': "I didn't catch the number of guests.",
},
'max_retries': 3,
}},
{'name': 'preferred_date', 'source': 'user',
'setter': 'set_preferred_date',
'ask': 'What date were you thinking?',
'requires_readback': True},
{'name': 'available_times', 'source': 'task:FindAvailableTimes'},
{'name': 'selected_time', 'source': 'user',
'setter': 'set_selected_time',
'requires': ['available_times'],
'ask': 'We have {available_times}. Which time works for you?',
'requires_readback': True},
{'name': 'guest_name', 'source': 'user',
'setter': 'set_guest_name',
'ask': 'What name should I put the reservation under?',
'requires_readback': True},
{'name': 'special_requests', 'source': 'user',
'setter': 'set_special_requests',
'ask': 'Any special requests, or shall I note none?',
'requires_readback': True},
],
'tasks': [
{'name': 'FindAvailableTimes',
'inputs': ['party_size', 'preferred_date'],
'outputs': {'times': 'available_times'},
'success_check': 'success',
'then_say': 'We have {available_times}. Which time works for you?'},
{'name': 'BookReservation',
'inputs': ['party_size', 'preferred_date', 'selected_time',
'guest_name', 'special_requests'],
'outputs': {'confirmation': 'confirmation_number'},
'success_check': 'success',
'terminal': True,
'then_say': "You're confirmed! Your number is {confirmation_number}."},
],
}
The requires field on selected_time enforces the dependency at two levels:
- The question is skipped in
_next_questionuntilavailable_timesis filled. - The setter tool is hidden from the LLM's tool list until the dependency is met — so the LLM structurally cannot call
set_selected_timebefore availability is known.
The callback preemption mechanism¶
Callback Preemption — the most critical architectural detail
When the callback fires a task and preempts the LLM, the LLM does not get a turn. Any setter tools the LLM had queued in the current response but not yet called are permanently lost. This is why the agent instruction must require all setter tools to be called in the same response — deferring any setter to a later turn creates a race condition with the callback.
Preemption uses LlmResponse.from_parts() to bypass LLM generation entirely:
if task_fired and llm_request.contents and len(llm_request.contents) > 1:
return LlmResponse.from_parts(
parts=[Part.from_text(text=sm['_system_message'])]
)
The len(llm_request.contents) > 1 guard prevents preempting on the first user message — the LLM should always generate a natural greeting rather than a canned "How many guests?" on the opening turn.
The full flow:
User message
→ LLM calls setter tool(s) in one response
→ before_model_callback fires
→ Check DAG: are all task inputs now satisfied?
YES → fire task, store result in sm
→ If multi-content (not first turn): PREEMPT
→ LlmResponse returned directly — LLM skipped
→ Task result delivered verbatim
NO → compute next question
→ Set sm['_system_message']
→ Return OK → LLM generates response using _system_message
Five preemption triggers (not just task firing):
- Task fires — task result message delivered verbatim.
- Slot validation error — error message from config delivered verbatim.
- Readback transition — after
confirm_pending, the next question is delivered with a warm prefix. - Readback stall exhaustion — stall detector has rejected pending values too many times.
- Progress stall — no forward progress for N turns.
Step-by-step: adding slot filling to your agent¶
List every piece of information you need to collect, in the natural order to ask for it, and every backend task that fires from those inputs.
For a restaurant reservation: - party_size (user) → enables FindAvailableTimes - preferred_date (user) → enables FindAvailableTimes - available_times (task output from FindAvailableTimes) - selected_time (user, requires available_times) - guest_name (user) - special_requests (user) → enables BookReservation
In your CX Agent Studio app's variable declarations, add an sm variable of type STRING with this initial value:
One Python tool per user-collected slot. Follow the template: validate → write to sm['pending'] → return {"stored": True}. For invalid input, append to _slot_errors and return {"error": True}.
Keep docstrings to 2 sentences: format + trigger phrase.
An ordered list of (slot_name, question) pairs. Task output slots (like available_times) are added conditionally after their task has run. Return the first slot not in sm['filled'].
Check terminal state first. Evaluate the DAG (check whether each task's inputs are all in sm['filled']). Fire any ready tasks. Set sm['_system_message']. Preempt if a task fired and this is not the first turn.
Include the <slot_filling_protocol> block. Include a concrete multi-slot example in the batching rule. End with a <system_directive>{{system_message}}</system_directive> block so the LLM can see _system_message.
Cover: single slot per turn, multi-slot batching, error recovery, out-of-range input, natural language parsing, conversational detours, and the full happy path end-to-end.
Slot Filling vs. XML Taskflow¶
| Aspect | XML Taskflow | Slot Filling |
|---|---|---|
| State tracking | LLM memory — degrades over long conversations | context.state — deterministic, persists across turns |
| Task firing | LLM decides when (unreliable, premature or late) | Auto-fires when inputs ready — exact, every time |
| Progressive disclosure | Prompt-based — LLM sees all stages and can preview | Tool-driven — LLM only sees the current question |
| Validation | LLM-based — hallucinates error messages | Python code — exact validation, config-driven messages |
| Retry logic | LLM improvises — counts reset each turn | _retries dict — persists across turns, bounded by max_retries |
| Dependency enforcement | Prompt-based (fragile) | Tool visibility — LLM cannot call a hidden setter |
| Testability | Hard — non-deterministic LLM state | Easy — pure Python DAG functions, testable with dicts |
Writing evaluations for slot-filling agents¶
A healthy eval suite for a slot-filling agent covers five categories:
| Category | What to test |
|---|---|
| Happy path | Single slot per turn; multi-slot batching (2+ slots in one message); the full end-to-end flow |
| Error recovery | Invalid input (out of range, wrong format); the agent re-asks after error |
| Guard rails | Out-of-scope requests; off-topic detours that return to the flow |
| NLP parsing | Natural language dates ("next Friday"), quantities ("a couple"), times ("seven thirty PM") |
| Conversational detours | User asks a question mid-flow; agent answers and returns to collection |
Golden eval structure for multi-slot batching:
{
"golden": {
"turns": [
{
"steps": [
{"userInput": {"text": "Table for 2 on June 20th under Johnson"}},
{"expectation": {"toolCall": {"tool": "set_party_size",
"args": {"size": 2.0}}}},
{"expectation": {"toolCall": {"tool": "set_preferred_date",
"args": {"date": "2026-06-20"}}}},
{"expectation": {"toolCall": {"tool": "set_guest_name",
"args": {"name": "Johnson"}}}},
{"expectation": {"agentResponse": {"chunks": [{"text": "Johnson"}]}}}
]
}
]
}
}
agentResponse is required in every turn
Every expectations block must include an agentResponse check, even if it only verifies one word. An evaluation step with only a toolCall expectation is flagged as INVALID by the platform, and INVALID = FAIL. This is the most common cause of unexpected evaluation failures.
Scenario eval for end-to-end flow:
{
"scenario": {
"task": "Make a dinner reservation for 4 people on June 17th. When asked for a time, choose 7 PM. Name is Garcia. No special requests.",
"rubrics": [
"The agent must collect party size, date, time, name, and special requests.",
"The agent must be warm and inviting."
],
"maxTurns": 20
}
}
The 7 stabilization gotchas¶
These are the failure modes encountered shipping slot-filling agents to production, distilled into concrete, actionable items.
1. Missing agentResponse in evaluation expectations
Every turn in a golden eval must have an agentResponse expectation in its steps block. If a step only checks toolCall with no agentResponse, the platform marks it INVALID, which counts as a failure.
Fix: Add even a minimal agentResponse check — a single word that should appear in the response — to every turn.
2. Verbose docstrings break multi-slot batching
Adding validation details ("valid range: 1-8"), prerequisite caveats ("only call after date is confirmed"), or business rules ("we're closed on Mondays") to tool docstrings regresses batching. The LLM reads each tool's docstring separately, which fragments its attention away from the batching rule in the instruction.
Fix: Keep docstrings to 2 sentences. Put all business rules in the agent instruction or enforce them in Python code.
3. Missing concrete example in the batching instruction
The instruction rule "call ALL setter tools in the SAME response" is abstract. The LLM needs a specific, realistic example to generalize the behavior.
Fix: Add a concrete example like: "if the user says 'table for 2 on June 20th under the name Johnson', call set_party_size, set_preferred_date, AND set_guest_name — all in one turn."
4. Callback preemption races with deferred tool calls
If the LLM calls setter A in one tool call but defers setter B to the next response, and the callback fires between them (e.g., because setter A completed all of task X's inputs), setter B is never called. The preempted response overwrites whatever the LLM was about to say.
Fix: The agent instruction must require all setters to be called in one response. The concrete example in gotcha 3 helps drive this behavior.
5. Task output slots missing from _next_question order
If available_times (populated by a task) is not in the _next_question order, selected_time will never appear as the next question — the function will fall through to guest_name instead.
Fix: Add task output slots to the order list conditionally: check that the slot is in sm['filled'] before including the dependent slot's question in the list.
6. _system_message missing from some code paths
Every code path through the callback must set sm['_system_message']. If an error branch or early-return path forgets to set it, the LLM will see a stale message from the previous turn and relay incorrect guidance.
Fix: Set sm['_system_message'] at the end of the main path, and explicitly sm.pop('_system_message', None) in any path that should clear it.
7. Terminal task status not checked before firing tasks
After BookReservation fires and sets sm['status'] = 'complete', the callback must check this status before re-evaluating the DAG. Without the check, the callback may attempt to re-fire BookReservation on every subsequent turn (all inputs are still in filled, and the task's inputs-ready check would pass again).
Fix: Add a terminal state guard at the top of the callback:
Mapping from Sonic YAML¶
If you're migrating a Sonic-style agent definition to the Slot Filling Pattern:
| Sonic YAML | Slot Filling equivalent |
|---|---|
type: Slot, source: user | Setter tool + entry in _next_question order |
type: Slot, source: task:X | Task output written to sm['filled'] in the callback |
type: Slot, requires: [Y] | Conditional entry in _next_question (check Y in filled) |
Task with inputs: {a: $slot_a} | DAG check in callback: if 'slot_a' in sm['filled'] |
Task terminal: true | sm['status'] = 'complete' after task fires |
Slot validation: block | _slot_errors signal + error code lookup in callback |
Reference implementation¶
The Bella Notte restaurant reservation agent is the canonical reference implementation of this pattern. It implements:
- 7 slots:
party_size,preferred_date,available_times,selected_time,guest_name,special_requests,confirmation_number - 2 tasks:
FindAvailableTimes(fires afterparty_size+preferred_date),BookReservation(terminal, fires after all required slots) - 5 setter tools:
set_party_size,set_preferred_date,set_guest_name,set_selected_time,set_special_requests - 20+ golden evals, 5+ scenario evals