Skip to Content
Core ConceptsScenarios

Scenarios and Suites

Scenarios and suites are the foundational building blocks of AgenticAssure. A scenario defines a single test case for your AI agent, while a suite groups related scenarios together with shared configuration.


What Is a Scenario?

A scenario represents one discrete interaction with your AI agent. It captures:

  • What to send to the agent (the input prompt).
  • What you expect back (expected output, expected tool calls, expected arguments).
  • How to evaluate the agent’s response (which scorers to apply).

Each scenario is self-contained. When the runner executes a scenario, it sends the input to your agent adapter, collects the agent’s response, runs every configured scorer against that response, and records a pass or fail result.


Scenario Fields

Every scenario is defined by the Scenario model. The following table documents each field.

name (required)

A short, human-readable name that identifies the scenario. This appears in CLI output, reports, and logs.

name: basic_greeting

input (required)

The prompt or message that will be sent to your agent. This is the only value the agent sees during execution.

input: "What is the weather in New York today?"

description (optional)

A longer explanation of what the scenario is testing. Useful for documentation and for teammates reviewing the test suite.

description: "Verifies the agent can handle a weather lookup request for a US city."

expected_output (optional)

A substring that should appear in the agent’s output. Several built-in scorers use this field:

  • passfail — checks that expected_output appears (case-insensitive) within the agent’s response.
  • exact — checks that the full output matches expected_output exactly (with optional normalization).
  • similarity — computes semantic similarity between the output and expected_output.
expected_output: "sunny"

If omitted, scorers that depend on expected_output will either skip that check (passfail) or fail with an explanatory message (exact, similarity).

expected_tools (optional)

A list of tool names that the agent is expected to call during the scenario. The passfail scorer verifies that every tool in this list was actually invoked.

expected_tools: - get_weather - format_response

expected_tool_args (optional)

A mapping of tool names to their expected arguments. The passfail scorer performs key-by-key comparison, verifying that each expected argument key is present and its value matches exactly.

expected_tool_args: get_weather: location: "New York" units: "fahrenheit"

Only the keys listed here are checked. The agent may pass additional arguments that are not listed, and those extra arguments will not cause a failure.

tags (optional, default: [])

A list of string tags for categorizing and filtering scenarios. Tags let you run subsets of your test suite from the CLI using --tag.

tags: - weather - tools - smoke

metadata (optional, default: {})

An open dictionary for passing arbitrary configuration to scorers or for your own bookkeeping. Several built-in scorers read specific keys from metadata:

KeyUsed byDescription
regex_patternregex scorerThe regular expression pattern to match against the output.
exact_normalizeexact scorerBoolean. When true (the default), strips whitespace and lowercases both strings before comparison.
similarity_thresholdsimilarity scorerFloat between 0 and 1. Overrides the default similarity threshold of 0.7.
metadata: regex_pattern: "\\d{5}" exact_normalize: false

scorers (optional, default: ["passfail"])

A list of scorer names to evaluate this scenario. Scorers are resolved by name from the global scorer registry. A scenario passes only if every scorer passes.

scorers: - passfail - regex

Built-in scorers: passfail, exact, regex, similarity. You can also register custom scorers. See the Scorers documentation for details.

timeout_seconds (optional, default: 30.0)

The maximum time in seconds allowed for the agent to respond to this scenario. Must be a positive number.

timeout_seconds: 60.0

id (auto-generated)

A UUID automatically assigned to each scenario at creation time. You do not set this in YAML. It is used internally to link score results back to their scenario.


What Is a Suite?

A suite groups multiple scenarios into a named collection with optional shared configuration. Suites map directly to YAML files — each file defines one suite.

Suite Fields

name (required)

The name of the suite. If the suite block is omitted in YAML, the filename (without extension) is used as the suite name.

description (optional)

A human-readable description of what this suite covers.

tags (optional)

Suite-level tags. These are separate from scenario-level tags and are available for organizational purposes.

config (optional)

Suite-level configuration that overrides runner defaults. See “SuiteConfig” below.


SuiteConfig

The config block inside a suite controls execution behavior for that suite. When a suite defines config values, they take precedence over the runner’s defaults.

FieldTypeDefaultDescription
default_timeoutfloat30.0Default timeout in seconds for scenarios in this suite.
retriesint0Number of retry attempts per scenario (0 means no retries).
default_scorerslist[str]["passfail"]Default scorers applied to scenarios that do not specify their own.
fail_fastboolfalseStop executing after the first failed scenario.
config: default_timeout: 45 retries: 2 default_scorers: - passfail - exact fail_fast: true

YAML Format

Scenario files use YAML with the .yaml or .yml extension. A complete annotated example:

# Optional suite metadata block suite: name: customer-support-agent description: End-to-end tests for the customer support AI agent config: default_timeout: 45 retries: 1 fail_fast: false # Required: at least one scenario scenarios: # A basic scenario testing that the agent responds at all - name: greeting description: Agent should respond politely to a greeting input: "Hi there, I need help with my order." expected_output: "help" tags: - basic - greeting scorers: - passfail # A scenario testing tool usage - name: order_lookup description: Agent should call the order lookup tool with the correct order ID input: "Can you check the status of order #12345?" expected_tools: - lookup_order expected_tool_args: lookup_order: order_id: "12345" tags: - tools - orders scorers: - passfail # A scenario with multiple scorers and custom metadata - name: refund_policy description: Agent should explain the refund policy accurately input: "What is your refund policy?" expected_output: "30-day refund policy" metadata: regex_pattern: "\\d+ day" similarity_threshold: 0.8 tags: - policy scorers: - passfail - regex - similarity timeout_seconds: 60

Minimal Example

The only required top-level key is scenarios, and each scenario needs at minimum name and input:

scenarios: - name: smoke_test input: "Hello"

When the suite block is omitted, the suite name defaults to the YAML filename (without the extension).


JSON Schema Validation

AgenticAssure validates every YAML file against a built-in JSON Schema before loading it. The schema enforces:

  • The root must be an object (mapping).
  • scenarios is required and must be a non-empty array.
  • Each scenario must have name (string) and input (string).
  • timeout_seconds must be a positive number.
  • tags, scorers, and expected_tools must be arrays of strings when present.
  • expected_tool_args and metadata must be objects when present.
  • No additional properties are allowed at any level (typos in field names will be caught).
  • The optional suite block must contain name (required) and may contain description and config.
  • config fields are validated for correct types (number, integer, boolean, array).

If validation fails, AgenticAssure reports all schema errors at once with the path to the offending field. For example:

Schema validation failed for scenarios/bad.yaml: Schema: scenarios.0: 'input' is a required property Schema: scenarios.1.timeout_seconds: -5 is not valid under any of the given schemas

You can validate files without running them using the CLI:

agenticassure validate scenarios/

This runs both JSON Schema validation and additional semantic checks (such as verifying that scorers is a list and tags is a list) on each file.


Tips for Organizing Scenarios

One suite per capability. Group scenarios by the agent capability they test: tool-usage.yaml, knowledge-qa.yaml, error-handling.yaml. This makes it easy to run targeted subsets.

Use tags liberally. Tags allow cross-cutting filters. Tag scenarios by priority (smoke, regression), by feature area (billing, auth), or by resource requirements (slow, requires-api). Then run subsets with --tag:

agenticassure run scenarios/ --tag smoke

Keep inputs realistic. Your scenarios should reflect how real users interact with your agent. Avoid overly simplified inputs that do not exercise the agent’s full behavior.

Test failure modes. Include scenarios that expect the agent to handle bad input gracefully, decline out-of-scope requests, or recover from tool errors.

Layer your scorers. Use passfail as a baseline (did the agent respond and call the right tools?), then add exact or similarity for more precise output validation, and regex for structured data checks.

Use description for context. When a scenario fails, the description helps the team understand what the test was checking without having to reverse-engineer the input and expected values.

Version control your scenarios. YAML scenario files should live in your repository alongside your agent code so that changes to agent behavior and test expectations are tracked together.

Last updated on