Skip to Content
GuidesMultiple Suites

Working with Multiple Suites

As your AI agent grows in capability, a single YAML file of scenarios becomes difficult to maintain. AgenticAssure supports splitting your tests across multiple files and directories, with built-in directory scanning that makes it straightforward to organize large test suites.


Why Split Tests Across Files

  • Maintainability: A 500-line YAML file is hard to navigate. Smaller, focused files are easier to read, review, and update.
  • Ownership: Different team members can own different suite files without merge conflicts.
  • Selective execution: You can run a single suite file without loading unrelated scenarios.
  • Logical grouping: Separate files for separate concerns — orders, billing, onboarding, safety.
  • Suite-level configuration: Each file can define its own timeout, retry, and scorer defaults.

Directory Structure Recommendations

By Feature Area

The most common pattern: one file per feature or domain area of your agent.

scenarios/ orders.yaml returns.yaml billing.yaml onboarding.yaml faq.yaml safety.yaml

By Test Type

Organize by the kind of testing each file performs.

scenarios/ happy-path.yaml edge-cases.yaml error-handling.yaml safety-guardrails.yaml performance.yaml

By Agent (Multi-Agent Systems)

If you have multiple agents, give each its own directory.

scenarios/ support-agent/ core.yaml edge-cases.yaml safety.yaml billing-agent/ invoices.yaml payments.yaml search-agent/ queries.yaml filters.yaml

By Priority

Separate critical tests from extended coverage.

scenarios/ critical/ smoke-tests.yaml core-workflows.yaml extended/ edge-cases.yaml regression.yaml performance.yaml

Hybrid Approach

Combine multiple organizational dimensions.

scenarios/ support/ orders/ happy-path.yaml edge-cases.yaml returns/ happy-path.yaml edge-cases.yaml safety.yaml billing/ invoices.yaml payments.yaml

How Directory Scanning Works

When you point AgenticAssure at a directory, the load_scenarios_from_dir() function recursively finds all files with .yml or .yaml extensions, loads each one as a separate suite, and returns the full list.

# Scan the entire scenarios/ directory and all subdirectories agenticassure run scenarios/

The scanning process:

  1. Walks the directory tree recursively.
  2. Finds all files matching **/*.yml and **/*.yaml.
  3. Loads each file as an independent suite using load_scenarios().
  4. Returns the suites sorted by file path.

If any file fails to parse or validate, the entire load operation stops with an error pointing to the problematic file.

What Gets Loaded

  • scenarios/orders.yaml — loaded
  • scenarios/returns.yml — loaded
  • scenarios/support/safety.yaml — loaded (recursive)
  • scenarios/support/edge-cases/timeout.yml — loaded (deeply nested)
  • scenarios/README.md — ignored (not YAML)
  • scenarios/config.json — ignored (not YAML)

Suite Names from Files

If a YAML file includes a suite.name field, that name is used. If the suite block is omitted, the suite name defaults to the filename (without extension).

# File: scenarios/orders.yaml # Suite name will be "order-tests" (from the suite block) suite: name: order-tests scenarios: - name: lookup_order input: "Where is my order?"
# File: scenarios/returns.yaml # Suite name will be "returns" (from the filename) scenarios: - name: request_return input: "I want to return this item"

Running a Specific Suite with --suite

When you have multiple suite files loaded, you can run only one by name using the --suite / -s flag.

# Load all files from scenarios/ but only run the suite named "order-tests" agenticassure run scenarios/ --suite order-tests --adapter myproject.agent.MyAgent

You can also point directly at a single file instead of a directory:

# Load and run only this specific file agenticassure run scenarios/orders.yaml --adapter myproject.agent.MyAgent

The difference:

  • --suite loads all files in the directory, then filters by suite name. This is useful when your config file or adapter setup depends on the full directory context.
  • Pointing at a single file loads only that file. This is faster and simpler when you want to test one file in isolation.

Suite-Level Configuration

Each suite file can define its own configuration via the suite.config block. These settings override the runner defaults for scenarios in that suite.

suite: name: slow-integration-tests description: Tests that call external APIs and may take a while config: default_timeout: 120 retries: 2 default_scorers: - passfail fail_fast: false scenarios: - name: external_api_call input: "Fetch the latest report" expected_tools: - fetch_report

Configuration Fields

FieldTypeDefaultDescription
default_timeoutfloat30.0Timeout in seconds applied to scenarios that do not specify their own timeout_seconds.
retriesint0Number of retry attempts for failed scenarios. The scenario runs up to retries + 1 times total.
default_scorerslist[str]["passfail"]Scorers applied to scenarios that do not specify their own scorers list.
fail_fastboolfalseIf true, stop executing scenarios in this suite after the first failure.

Precedence

Scenario-level settings override suite-level settings:

suite: name: mixed-timeouts config: default_timeout: 60 # Suite default scenarios: - name: fast_test input: "Quick question" timeout_seconds: 10 # Overrides to 10 seconds - name: normal_test input: "Regular question" # Uses suite default of 60 seconds

Organizing by Feature, Severity, or Agent Type

By Feature

Create one suite file per feature area. This keeps scenarios closely related and makes it easy to run tests for a specific feature during development.

# scenarios/orders.yaml suite: name: orders description: Order management scenarios scenarios: - name: create_order input: "Place an order for Widget A" expected_tools: [create_order] tags: [orders, create] - name: cancel_order input: "Cancel order #12345" expected_tools: [cancel_order] tags: [orders, delete] - name: order_history input: "Show my recent orders" expected_tools: [list_orders] tags: [orders, read]
# Run only order tests during development agenticassure run scenarios/orders.yaml --adapter myproject.agent.MyAgent

By Severity

Separate critical tests from nice-to-have coverage. Run critical tests on every PR; run the full suite nightly.

# scenarios/critical/smoke-tests.yaml suite: name: smoke-tests description: Must-pass scenarios that gate every deployment config: retries: 1 fail_fast: true scenarios: - name: agent_responds input: "Hello" scorers: [passfail] tags: [critical, smoke] - name: core_tool_works input: "Look up order #TEST-001" expected_tools: [lookup_order] tags: [critical, smoke]
# CI: run only critical smoke tests agenticassure run scenarios/critical/ --adapter myproject.agent.MyAgent # Nightly: run everything agenticassure run scenarios/ --adapter myproject.agent.MyAgent

By Agent Type

In multi-agent systems, keep each agent’s tests isolated.

# Run tests for just the support agent agenticassure run scenarios/support-agent/ --adapter myproject.support.SupportAgent # Run tests for just the billing agent agenticassure run scenarios/billing-agent/ --adapter myproject.billing.BillingAgent

This also makes it clear which adapter corresponds to which test directory, reducing confusion when different agents have different capabilities and tool sets.


Tips for Working with Multiple Suites

  • Keep suite files focused: Each file should cover one cohesive area. If a file grows beyond 20-30 scenarios, consider splitting it further.
  • Use consistent naming: Follow a predictable pattern for file names and suite names so team members can find tests quickly.
  • Leverage tags across suites: Even with file-based organization, tags add a cross-cutting dimension. Tag scenarios with critical across all suite files, then run --tag critical to get a cross-cutting smoke test.
  • Validate before running: Use agenticassure validate scenarios/ to catch syntax errors across all files before spending time and API credits on execution.
  • Document your structure: If your test directory is complex, add a brief comment at the top of each suite file explaining what it covers.
Last updated on