Runner
The Runner is the execution engine of AgenticAssure. It takes a suite of scenarios and an adapter, then orchestrates the process of sending each scenario’s input to the agent, collecting the response, running scorers, handling retries, and aggregating results.
What the Runner Does
For each scenario in a suite, the runner:
- Sends the scenario’s
inputto the adapter’srun()method. - Collects the
AgentResultreturned by the adapter. - Looks up each scorer listed in the scenario’s
scorersfield from the registry. - Calls each scorer’s
score()method with the scenario and the agent result. - Determines pass/fail: the scenario passes only if every scorer passes.
- If the scenario fails and retries are configured, repeats from step 1.
- Records the
ScenarioRunResultwith timing, scores, and error information.
After all scenarios have been executed (or execution is halted by fail_fast), the runner computes aggregate statistics and returns a RunResult.
Configuration
The Runner accepts three configuration parameters at construction time:
from agenticassure.runner import Runner
runner = Runner(
adapter=my_adapter,
default_timeout=30.0,
retries=0,
fail_fast=False,
)adapter (required)
An object implementing the AgentAdapter protocol. See the Adapters documentation.
default_timeout (optional, default: 30.0)
The default timeout in seconds for each scenario. This value is available for adapters and custom integrations. Individual scenarios can override this with their timeout_seconds field.
retries (optional, default: 0)
The number of retry attempts per scenario. A value of 0 means each scenario is attempted exactly once. A value of 2 means a failing scenario will be retried up to 2 additional times (3 total attempts).
fail_fast (optional, default: False)
When True, the runner stops executing scenarios after the first failure. Scenarios that were not reached are not included in the results. This is useful for fast feedback during development.
Suite Config vs Runner Config Precedence
Configuration can be set in two places: the Runner constructor and the suite’s config block. When both are set, suite config takes precedence over runner defaults for the following fields:
| Field | Runner default used when… | Suite config used when… |
|---|---|---|
retries | Suite retries is 0 | Suite retries is non-zero |
fail_fast | Suite fail_fast is False | Suite fail_fast is True |
This means:
- If your suite YAML sets
retries: 3, that overrides the runner’sretriesvalue. - If your suite YAML sets
fail_fast: true, that overrides the runner’sfail_fastvalue. - If your suite YAML does not set these fields (or leaves them at their defaults of
0andfalse), the runner’s constructor values are used.
Example:
# Runner configured with 1 retry
runner = Runner(adapter=my_adapter, retries=1)# Suite overrides to 3 retries
suite:
name: my-suite
config:
retries: 3In this case, scenarios will be retried up to 3 times (the suite config wins).
Tag-Based Filtering
The run_suite method accepts an optional tags parameter to run only scenarios matching one or more tags:
result = runner.run_suite(my_suite, tags=["smoke", "critical"])Filtering uses set intersection: a scenario is included if it has at least one tag in common with the provided list. A scenario tagged ["smoke", "api"] would match a filter of ["smoke", "critical"] because "smoke" is in both sets.
Scenarios with no tags will never match a tag filter. If tags is None (the default), all scenarios in the suite are executed.
From the CLI, use the --tag flag (repeatable):
agenticassure run scenarios/ --adapter mymodule.MyAgent --tag smoke --tag criticalRetry Behavior
When retries are configured (either via the runner constructor or the suite config), the runner attempts each scenario up to retries + 1 times.
How It Works
- The runner calls the adapter with the scenario input.
- If the adapter raises an exception, the error is recorded and the runner moves to the next attempt.
- If the adapter returns successfully and all scorers pass, the scenario is marked as passed and no further retries occur.
- If the adapter returns successfully but any scorer fails, the scenario is marked as passed on that attempt (retries only guard against exceptions, not scorer failures on a successful adapter call).
- If all attempts are exhausted due to exceptions, the runner returns a failing
ScenarioRunResultwith the last error message and an emptyAgentResult.
Important Details
- Retries only apply to exceptions. If the adapter returns a valid
AgentResultbut a scorer fails, that result is returned immediately without retrying. Retries are designed to handle transient errors like network timeouts or rate limits, not incorrect agent behavior. - The
retry_countfield onScenarioRunResultrecords the zero-based attempt index that produced the final result. A value of0means the first attempt succeeded or produced the returned result. - Each retry is a fresh call. There is no state carried between retry attempts. The adapter receives the same input and context each time.
Example
runner = Runner(adapter=my_adapter, retries=2)
result = runner.run_scenario(my_scenario)
if result.error:
print(f"Failed after {result.retry_count + 1} attempts: {result.error}")
elif result.retry_count > 0:
print(f"Succeeded on attempt {result.retry_count + 1}")Error Handling
The runner catches all exceptions raised by the adapter during scenario execution. Exceptions are handled as follows:
- The exception type and message are formatted as
"ExceptionType: message"and stored inScenarioRunResult.error. - If retries are configured, the runner attempts the scenario again.
- If all retries are exhausted, the final
ScenarioRunResulthas:passed = Falseagent_resultset to anAgentResultwith an empty outputerrorset to the last exception’s formatted messagescoresas an empty list (scorers are not run when the adapter errors)retry_countset to the index of the last attempt
Exceptions from scorers are not caught by the runner. If a scorer raises an exception, it will propagate up to the caller. Scorer exceptions indicate a bug in the scorer implementation and should be fixed.
Using Runner from Python
While the CLI is the most common way to use AgenticAssure, the Runner class is designed for direct use from Python code. This is useful for integration into CI pipelines, custom test harnesses, or Jupyter notebooks.
Running a Full Suite
from agenticassure.loader import load_scenarios
from agenticassure.runner import Runner
# Load scenarios from YAML
suite = load_scenarios("scenarios/customer-support.yaml")
# Create a runner with your adapter
runner = Runner(adapter=my_adapter, retries=1, fail_fast=False)
# Execute the suite
result = runner.run_suite(suite)
# Check results
print(f"Pass rate: {result.pass_rate:.0%}")
print(f"Aggregate score: {result.aggregate_score:.3f}")
# Fail the CI build if any scenario failed
if result.pass_rate < 1.0:
sys.exit(1)Running a Single Scenario
You can run an individual scenario without a suite:
from agenticassure.scenario import Scenario
from agenticassure.runner import Runner
scenario = Scenario(
name="quick_test",
input="What is 2 + 2?",
expected_output="4",
scorers=["passfail", "exact"],
)
runner = Runner(adapter=my_adapter)
result = runner.run_scenario(scenario)
print(f"Passed: {result.passed}")
for score in result.scores:
print(f" {score.scorer_name}: {score.explanation}")Running with Tag Filters
result = runner.run_suite(suite, tags=["smoke"])Passing Context
The optional context parameter is forwarded to the adapter’s run() method. Use it to pass session state, authentication tokens, or any other runtime data your agent needs:
context = {
"user_id": "test-user-123",
"session_token": "abc...",
"feature_flags": {"new_tool": True},
}
result = runner.run_suite(suite, context=context)Loading Multiple Suites from a Directory
from agenticassure.loader import load_scenarios_from_dir
suites = load_scenarios_from_dir("scenarios/")
runner = Runner(adapter=my_adapter)
for suite in suites:
result = runner.run_suite(suite)
print(f"{suite.name}: {result.pass_rate:.0%}")Custom Reporting
Since RunResult and all nested result models are Pydantic models, you can serialize them and build any reporting or monitoring integration you need:
import json
result = runner.run_suite(suite)
# Write raw results to a JSON file
with open("results.json", "w") as f:
f.write(result.model_dump_json(indent=2))
# Send summary metrics to your monitoring system
metrics = {
"suite": result.suite_name,
"pass_rate": result.pass_rate,
"aggregate_score": result.aggregate_score,
"duration_ms": result.total_duration_ms,
"scenarios_run": len(result.scenario_results),
"scenarios_passed": sum(1 for sr in result.scenario_results if sr.passed),
}
send_to_monitoring(metrics)