Scenarios and Suites
Scenarios and suites are the foundational building blocks of AgenticAssure. A scenario defines a single test case for your AI agent, while a suite groups related scenarios together with shared configuration.
What Is a Scenario?
A scenario represents one discrete interaction with your AI agent. It captures:
- What to send to the agent (the input prompt).
- What you expect back (expected output, expected tool calls, expected arguments).
- How to evaluate the agent’s response (which scorers to apply).
Each scenario is self-contained. When the runner executes a scenario, it sends the input to your agent adapter, collects the agent’s response, runs every configured scorer against that response, and records a pass or fail result.
Scenario Fields
Every scenario is defined by the Scenario model. The following table documents each field.
name (required)
A short, human-readable name that identifies the scenario. This appears in CLI output, reports, and logs.
name: basic_greetinginput (required)
The prompt or message that will be sent to your agent. This is the only value the agent sees during execution.
input: "What is the weather in New York today?"description (optional)
A longer explanation of what the scenario is testing. Useful for documentation and for teammates reviewing the test suite.
description: "Verifies the agent can handle a weather lookup request for a US city."expected_output (optional)
A substring that should appear in the agent’s output. Several built-in scorers use this field:
- passfail — checks that
expected_outputappears (case-insensitive) within the agent’s response. - exact — checks that the full output matches
expected_outputexactly (with optional normalization). - similarity — computes semantic similarity between the output and
expected_output.
expected_output: "sunny"If omitted, scorers that depend on expected_output will either skip that check (passfail) or fail with an explanatory message (exact, similarity).
expected_tools (optional)
A list of tool names that the agent is expected to call during the scenario. The passfail scorer verifies that every tool in this list was actually invoked.
expected_tools:
- get_weather
- format_responseexpected_tool_args (optional)
A mapping of tool names to their expected arguments. The passfail scorer performs key-by-key comparison, verifying that each expected argument key is present and its value matches exactly.
expected_tool_args:
get_weather:
location: "New York"
units: "fahrenheit"Only the keys listed here are checked. The agent may pass additional arguments that are not listed, and those extra arguments will not cause a failure.
tags (optional, default: [])
A list of string tags for categorizing and filtering scenarios. Tags let you run subsets of your test suite from the CLI using --tag.
tags:
- weather
- tools
- smokemetadata (optional, default: {})
An open dictionary for passing arbitrary configuration to scorers or for your own bookkeeping. Several built-in scorers read specific keys from metadata:
| Key | Used by | Description |
|---|---|---|
regex_pattern | regex scorer | The regular expression pattern to match against the output. |
exact_normalize | exact scorer | Boolean. When true (the default), strips whitespace and lowercases both strings before comparison. |
similarity_threshold | similarity scorer | Float between 0 and 1. Overrides the default similarity threshold of 0.7. |
metadata:
regex_pattern: "\\d{5}"
exact_normalize: falsescorers (optional, default: ["passfail"])
A list of scorer names to evaluate this scenario. Scorers are resolved by name from the global scorer registry. A scenario passes only if every scorer passes.
scorers:
- passfail
- regexBuilt-in scorers: passfail, exact, regex, similarity. You can also register custom scorers. See the Scorers documentation for details.
timeout_seconds (optional, default: 30.0)
The maximum time in seconds allowed for the agent to respond to this scenario. Must be a positive number.
timeout_seconds: 60.0id (auto-generated)
A UUID automatically assigned to each scenario at creation time. You do not set this in YAML. It is used internally to link score results back to their scenario.
What Is a Suite?
A suite groups multiple scenarios into a named collection with optional shared configuration. Suites map directly to YAML files — each file defines one suite.
Suite Fields
name (required)
The name of the suite. If the suite block is omitted in YAML, the filename (without extension) is used as the suite name.
description (optional)
A human-readable description of what this suite covers.
tags (optional)
Suite-level tags. These are separate from scenario-level tags and are available for organizational purposes.
config (optional)
Suite-level configuration that overrides runner defaults. See “SuiteConfig” below.
SuiteConfig
The config block inside a suite controls execution behavior for that suite. When a suite defines config values, they take precedence over the runner’s defaults.
| Field | Type | Default | Description |
|---|---|---|---|
default_timeout | float | 30.0 | Default timeout in seconds for scenarios in this suite. |
retries | int | 0 | Number of retry attempts per scenario (0 means no retries). |
default_scorers | list[str] | ["passfail"] | Default scorers applied to scenarios that do not specify their own. |
fail_fast | bool | false | Stop executing after the first failed scenario. |
config:
default_timeout: 45
retries: 2
default_scorers:
- passfail
- exact
fail_fast: trueYAML Format
Scenario files use YAML with the .yaml or .yml extension. A complete annotated example:
# Optional suite metadata block
suite:
name: customer-support-agent
description: End-to-end tests for the customer support AI agent
config:
default_timeout: 45
retries: 1
fail_fast: false
# Required: at least one scenario
scenarios:
# A basic scenario testing that the agent responds at all
- name: greeting
description: Agent should respond politely to a greeting
input: "Hi there, I need help with my order."
expected_output: "help"
tags:
- basic
- greeting
scorers:
- passfail
# A scenario testing tool usage
- name: order_lookup
description: Agent should call the order lookup tool with the correct order ID
input: "Can you check the status of order #12345?"
expected_tools:
- lookup_order
expected_tool_args:
lookup_order:
order_id: "12345"
tags:
- tools
- orders
scorers:
- passfail
# A scenario with multiple scorers and custom metadata
- name: refund_policy
description: Agent should explain the refund policy accurately
input: "What is your refund policy?"
expected_output: "30-day refund policy"
metadata:
regex_pattern: "\\d+ day"
similarity_threshold: 0.8
tags:
- policy
scorers:
- passfail
- regex
- similarity
timeout_seconds: 60Minimal Example
The only required top-level key is scenarios, and each scenario needs at minimum name and input:
scenarios:
- name: smoke_test
input: "Hello"When the suite block is omitted, the suite name defaults to the YAML filename (without the extension).
JSON Schema Validation
AgenticAssure validates every YAML file against a built-in JSON Schema before loading it. The schema enforces:
- The root must be an object (mapping).
scenariosis required and must be a non-empty array.- Each scenario must have
name(string) andinput(string). timeout_secondsmust be a positive number.tags,scorers, andexpected_toolsmust be arrays of strings when present.expected_tool_argsandmetadatamust be objects when present.- No additional properties are allowed at any level (typos in field names will be caught).
- The optional
suiteblock must containname(required) and may containdescriptionandconfig. configfields are validated for correct types (number, integer, boolean, array).
If validation fails, AgenticAssure reports all schema errors at once with the path to the offending field. For example:
Schema validation failed for scenarios/bad.yaml:
Schema: scenarios.0: 'input' is a required property
Schema: scenarios.1.timeout_seconds: -5 is not valid under any of the given schemasYou can validate files without running them using the CLI:
agenticassure validate scenarios/This runs both JSON Schema validation and additional semantic checks (such as verifying that scorers is a list and tags is a list) on each file.
Tips for Organizing Scenarios
One suite per capability. Group scenarios by the agent capability they test: tool-usage.yaml, knowledge-qa.yaml, error-handling.yaml. This makes it easy to run targeted subsets.
Use tags liberally. Tags allow cross-cutting filters. Tag scenarios by priority (smoke, regression), by feature area (billing, auth), or by resource requirements (slow, requires-api). Then run subsets with --tag:
agenticassure run scenarios/ --tag smokeKeep inputs realistic. Your scenarios should reflect how real users interact with your agent. Avoid overly simplified inputs that do not exercise the agent’s full behavior.
Test failure modes. Include scenarios that expect the agent to handle bad input gracefully, decline out-of-scope requests, or recover from tool errors.
Layer your scorers. Use passfail as a baseline (did the agent respond and call the right tools?), then add exact or similarity for more precise output validation, and regex for structured data checks.
Use description for context. When a scenario fails, the description helps the team understand what the test was checking without having to reverse-engineer the input and expected values.
Version control your scenarios. YAML scenario files should live in your repository alongside your agent code so that changes to agent behavior and test expectations are tracked together.