Skip to Content

agenticassure run

The run command is the primary entry point for executing test scenarios against your AI agent. It loads YAML scenario files, resolves an adapter, runs each scenario through the adapter, scores the results, and outputs a report.

agenticassure run [OPTIONS] [PATH]

Arguments

ArgumentDefaultDescription
PATH. (current directory)A file or directory containing YAML scenario files. Must exist.

Options

OptionShortTypeDefaultDescription
--adapter-aStringNonePython dotted path to an AgentAdapter class
--suite-sStringNoneFilter to a specific suite by name
--tag-tString (repeatable)NoneFilter scenarios by tag. Can be specified multiple times.
--output-oChoice: cli, json, htmlcliOutput/report format
--timeoutFloat30.0Default timeout in seconds per scenario
--retryInteger0Number of retries per scenario on failure
--dry-runFlagfalseValidate and list scenarios without executing them
--helpFlagShow help and exit

Path Resolution

The PATH argument determines how scenarios are loaded:

  • Single file: If PATH points to a .yml or .yaml file, that single file is loaded as one suite.
  • Directory: If PATH points to a directory, AgenticAssure recursively scans for all .yml and .yaml files and loads each as a suite.
  • Default: If no path is given, the current working directory (.) is scanned.
# Load a single file agenticassure run scenarios/search_tests.yaml --adapter my_agent.MyAgent # Load all YAML files in a directory (recursive) agenticassure run scenarios/ --adapter my_agent.MyAgent # Use current directory agenticassure run --adapter my_agent.MyAgent

Adapter Resolution

AgenticAssure needs an adapter to execute scenarios. The adapter is a Python class that implements the AgentAdapter protocol and acts as the bridge between AgenticAssure and your agent. The adapter is resolved in the following order:

  1. --adapter flag — If provided on the command line, this takes priority. The value is a Python dotted path like mymodule.MyAgent.
  2. Config file — If no flag is provided, AgenticAssure looks for a config file in the current working directory:
    • agenticassure.yaml — checked first, looks for an adapter key.
    • agenticassure.toml — checked second, looks for an adapter key.
  3. No adapter found — If neither source provides an adapter, AgenticAssure displays the loaded scenarios in a dry-run style table and prints instructions on how to provide an adapter. It does not exit with an error in this case.

Config file example (agenticassure.yaml):

adapter: my_agent.MyAgent

Config file example (agenticassure.toml):

adapter = "my_agent.MyAgent"

The adapter class is dynamically imported and instantiated. It must:

  • Be importable from the current Python environment (installed or on PYTHONPATH).
  • Have a no-argument constructor.
  • Implement the AgentAdapter protocol (i.e., have a run(input, context=None) -> AgentResult method).

If any of these conditions are not met, AgenticAssure raises a descriptive error.

Output Formats

The --output flag controls how results are presented. Each format is described in detail in the Reports section.

CLI (default)

agenticassure run scenarios/ --adapter my_agent.MyAgent # or explicitly: agenticassure run scenarios/ --adapter my_agent.MyAgent --output cli

Results are printed as a Rich-formatted table directly in the terminal with color-coded pass/fail status, scores, durations, and details.

JSON

agenticassure run scenarios/ --adapter my_agent.MyAgent --output json

Writes a structured JSON file named results_{run_id}.json to the current directory. The run ID is a UUID generated for each run.

HTML

agenticassure run scenarios/ --adapter my_agent.MyAgent --output html

Writes a standalone HTML file named report_{run_id}.html to the current directory. The file includes embedded CSS and requires no external dependencies to open.

Tag Filtering

Use --tag to run only scenarios that have a matching tag. Tags are defined per-scenario in the YAML file. The flag can be specified multiple times, and a scenario is included if it has any of the specified tags (OR logic).

# Run only scenarios tagged "smoke" agenticassure run scenarios/ --adapter my_agent.MyAgent --tag smoke # Run scenarios tagged "tools" or "regression" agenticassure run scenarios/ --adapter my_agent.MyAgent --tag tools --tag regression

Scenarios without any matching tags are skipped. If no --tag flags are provided, all scenarios are executed.

Suite Filtering

Use --suite to run only scenarios from a specific named suite. This is useful when a directory contains multiple YAML files (each defining a suite) and you want to target just one.

agenticassure run scenarios/ --adapter my_agent.MyAgent --suite search-agent-tests

If the named suite is not found among the loaded files, AgenticAssure prints an error and exits with code 1.

Timeout and Retry

Timeout

The --timeout flag sets the default timeout in seconds for each scenario. If a scenario’s adapter call exceeds this duration, it is marked as failed.

agenticassure run scenarios/ --adapter my_agent.MyAgent --timeout 60

Retry

The --retry flag specifies how many times to retry a failed scenario before marking it as failed. This is useful for handling non-deterministic LLM responses.

# Retry each failed scenario up to 2 times agenticassure run scenarios/ --adapter my_agent.MyAgent --retry 2

With --retry 2, a scenario is attempted up to 3 times total (1 initial run + 2 retries).

Dry-Run Mode

The --dry-run flag validates and loads all scenario files, then displays them in a summary table without executing anything. No adapter is required.

agenticassure run scenarios/ --dry-run

Output:

Loaded 5 scenario(s) from 2 suite(s) Scenarios (dry run) ┌──────────────────┬───────────────┬─────────────────┬──────────┬───────┐ │ Suite │ Scenario │ Input │ Scorers │ Tags │ ├──────────────────┼───────────────┼─────────────────┼──────────┼───────┤ │ search-tests │ weather_query │ What is the... │ passfail │ tools │ │ search-tests │ greeting │ Hello, how... │ passfail │ basic │ └──────────────────┴───────────────┴─────────────────┴──────────┴───────┘ 5 scenario(s) found

Dry-run mode is useful for:

  • Verifying that scenario files parse correctly before running.
  • Confirming which scenarios match a given --tag or --suite filter.
  • Checking scenario coverage without incurring LLM API costs.

Tag filtering works in dry-run mode:

agenticassure run scenarios/ --dry-run --tag smoke

Exit Codes

Exit CodeMeaning
0All executed scenarios passed
1At least one scenario failed, or an error occurred (invalid path, suite not found, adapter import failed)

When running multiple suites, an overall summary is printed if more than one suite was loaded:

Overall: 8/10 scenarios passed across 3 suite(s)

Examples

Basic run with adapter flag:

agenticassure run scenarios/ --adapter my_agent.MyAgent

Run a single file with HTML output:

agenticassure run scenarios/search_tests.yaml --adapter my_agent.MyAgent --output html

Run with retries, longer timeout, and tag filter:

agenticassure run scenarios/ \ --adapter my_agent.MyAgent \ --timeout 60 \ --retry 2 \ --tag regression

Run with adapter from config file (no —adapter flag needed):

# Assuming agenticassure.yaml exists with adapter: my_agent.MyAgent agenticassure run scenarios/

Dry-run to preview what will execute:

agenticassure run scenarios/ --dry-run --suite search-agent-tests --tag tools

JSON output for CI pipelines:

agenticassure run scenarios/ --adapter my_agent.MyAgent --output json

What’s Next

  • CLI Report — Understanding the terminal output.
  • HTML Report — Generating and sharing HTML reports.
  • JSON Report — Structured output for programmatic consumption.
  • Adapters — How to write an adapter for your agent.
Last updated on