Working with Multiple Suites

As your AI agent grows in capability, a single YAML file of scenarios becomes difficult to maintain. AgenticAssure supports splitting your tests across multiple files and directories, with built-in directory scanning that makes it straightforward to organize large test suites.

Why Split Tests Across Files

Maintainability: A 500-line YAML file is hard to navigate. Smaller, focused files are easier to read, review, and update.
Ownership: Different team members can own different suite files without merge conflicts.
Selective execution: You can run a single suite file without loading unrelated scenarios.
Logical grouping: Separate files for separate concerns — orders, billing, onboarding, safety.
Suite-level configuration: Each file can define its own timeout, retry, and scorer defaults.

Directory Structure Recommendations

By Feature Area

The most common pattern: one file per feature or domain area of your agent.


scenarios/
  orders.yaml
  returns.yaml
  billing.yaml
  onboarding.yaml
  faq.yaml
  safety.yaml

By Test Type

Organize by the kind of testing each file performs.


scenarios/
  happy-path.yaml
  edge-cases.yaml
  error-handling.yaml
  safety-guardrails.yaml
  performance.yaml

By Agent (Multi-Agent Systems)

If you have multiple agents, give each its own directory.


scenarios/
  support-agent/
    core.yaml
    edge-cases.yaml
    safety.yaml
  billing-agent/
    invoices.yaml
    payments.yaml
  search-agent/
    queries.yaml
    filters.yaml

By Priority

Separate critical tests from extended coverage.


scenarios/
  critical/
    smoke-tests.yaml
    core-workflows.yaml
  extended/
    edge-cases.yaml
    regression.yaml
    performance.yaml

Hybrid Approach

Combine multiple organizational dimensions.


scenarios/
  support/
    orders/
      happy-path.yaml
      edge-cases.yaml
    returns/
      happy-path.yaml
      edge-cases.yaml
    safety.yaml
  billing/
    invoices.yaml
    payments.yaml

How Directory Scanning Works

When you point AgenticAssure at a directory, the load_scenarios_from_dir() function recursively finds all files with .yml or .yaml extensions, loads each one as a separate suite, and returns the full list.


# Scan the entire scenarios/ directory and all subdirectories
agenticassure run scenarios/

The scanning process:

Walks the directory tree recursively.
Finds all files matching **/*.yml and **/*.yaml.
Loads each file as an independent suite using load_scenarios().
Returns the suites sorted by file path.

If any file fails to parse or validate, the entire load operation stops with an error pointing to the problematic file.

What Gets Loaded

scenarios/orders.yaml — loaded
scenarios/returns.yml — loaded
scenarios/support/safety.yaml — loaded (recursive)
scenarios/support/edge-cases/timeout.yml — loaded (deeply nested)
scenarios/README.md — ignored (not YAML)
scenarios/config.json — ignored (not YAML)

Suite Names from Files

If a YAML file includes a suite.name field, that name is used. If the suite block is omitted, the suite name defaults to the filename (without extension).


# File: scenarios/orders.yaml
# Suite name will be "order-tests" (from the suite block)
suite:
  name: order-tests
 
scenarios:
  - name: lookup_order
    input: "Where is my order?"


# File: scenarios/returns.yaml
# Suite name will be "returns" (from the filename)
scenarios:
  - name: request_return
    input: "I want to return this item"

Running a Specific Suite with `--suite`

When you have multiple suite files loaded, you can run only one by name using the --suite / -s flag.


# Load all files from scenarios/ but only run the suite named "order-tests"
agenticassure run scenarios/ --suite order-tests --adapter myproject.agent.MyAgent

You can also point directly at a single file instead of a directory:


# Load and run only this specific file
agenticassure run scenarios/orders.yaml --adapter myproject.agent.MyAgent

The difference:

--suite loads all files in the directory, then filters by suite name. This is useful when your config file or adapter setup depends on the full directory context.
Pointing at a single file loads only that file. This is faster and simpler when you want to test one file in isolation.

Suite-Level Configuration

Each suite file can define its own configuration via the suite.config block. These settings override the runner defaults for scenarios in that suite.


suite:
  name: slow-integration-tests
  description: Tests that call external APIs and may take a while
  config:
    default_timeout: 120
    retries: 2
    default_scorers:
      - passfail
    fail_fast: false
 
scenarios:
  - name: external_api_call
    input: "Fetch the latest report"
    expected_tools:
      - fetch_report

Configuration Fields

Field	Type	Default	Description
`default_timeout`	float	30.0	Timeout in seconds applied to scenarios that do not specify their own `timeout_seconds`.
`retries`	int	0	Number of retry attempts for failed scenarios. The scenario runs up to `retries + 1` times total.
`default_scorers`	list[str]	`["passfail"]`	Scorers applied to scenarios that do not specify their own `scorers` list.
`fail_fast`	bool	false	If true, stop executing scenarios in this suite after the first failure.

Precedence

Scenario-level settings override suite-level settings:


suite:
  name: mixed-timeouts
  config:
    default_timeout: 60  # Suite default
 
scenarios:
  - name: fast_test
    input: "Quick question"
    timeout_seconds: 10  # Overrides to 10 seconds
 
  - name: normal_test
    input: "Regular question"
    # Uses suite default of 60 seconds

Organizing by Feature, Severity, or Agent Type

By Feature

Create one suite file per feature area. This keeps scenarios closely related and makes it easy to run tests for a specific feature during development.


# scenarios/orders.yaml
suite:
  name: orders
  description: Order management scenarios
 
scenarios:
  - name: create_order
    input: "Place an order for Widget A"
    expected_tools: [create_order]
    tags: [orders, create]
 
  - name: cancel_order
    input: "Cancel order #12345"
    expected_tools: [cancel_order]
    tags: [orders, delete]
 
  - name: order_history
    input: "Show my recent orders"
    expected_tools: [list_orders]
    tags: [orders, read]


# Run only order tests during development
agenticassure run scenarios/orders.yaml --adapter myproject.agent.MyAgent

By Severity

Separate critical tests from nice-to-have coverage. Run critical tests on every PR; run the full suite nightly.


# scenarios/critical/smoke-tests.yaml
suite:
  name: smoke-tests
  description: Must-pass scenarios that gate every deployment
  config:
    retries: 1
    fail_fast: true
 
scenarios:
  - name: agent_responds
    input: "Hello"
    scorers: [passfail]
    tags: [critical, smoke]
 
  - name: core_tool_works
    input: "Look up order #TEST-001"
    expected_tools: [lookup_order]
    tags: [critical, smoke]


# CI: run only critical smoke tests
agenticassure run scenarios/critical/ --adapter myproject.agent.MyAgent
 
# Nightly: run everything
agenticassure run scenarios/ --adapter myproject.agent.MyAgent

By Agent Type

In multi-agent systems, keep each agent’s tests isolated.


# Run tests for just the support agent
agenticassure run scenarios/support-agent/ --adapter myproject.support.SupportAgent
 
# Run tests for just the billing agent
agenticassure run scenarios/billing-agent/ --adapter myproject.billing.BillingAgent

This also makes it clear which adapter corresponds to which test directory, reducing confusion when different agents have different capabilities and tool sets.

Tips for Working with Multiple Suites

Keep suite files focused: Each file should cover one cohesive area. If a file grows beyond 20-30 scenarios, consider splitting it further.
Use consistent naming: Follow a predictable pattern for file names and suite names so team members can find tests quickly.
Leverage tags across suites: Even with file-based organization, tags add a cross-cutting dimension. Tag scenarios with critical across all suite files, then run --tag critical to get a cross-cutting smoke test.
Validate before running: Use agenticassure validate scenarios/ to catch syntax errors across all files before spending time and API credits on execution.
Document your structure: If your test directory is complex, add a brief comment at the top of each suite file explaining what it covers.