Runner API Reference
The Runner class is the core execution engine in AgenticAssure. It takes an agent adapter, executes test scenarios against it, runs scorers on the results, and returns structured result objects.
Module: agenticassure.runner
from agenticassure.runner import RunnerClass: Runner
class Runner:
def __init__(
self,
adapter: AgentAdapter,
default_timeout: float = 30.0,
retries: int = 0,
fail_fast: bool = False,
) -> NoneSequential test runner that executes scenarios against an agent adapter.
Constructor Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
adapter | AgentAdapter | required | An object implementing the AgentAdapter protocol. Must have a run(input, context=None) -> AgentResult method. |
default_timeout | float | 30.0 | Default timeout in seconds for scenario execution. |
retries | int | 0 | Number of retry attempts per scenario on failure. 0 means no retries (each scenario is attempted exactly once). |
fail_fast | bool | False | If True, stop executing remaining scenarios after the first failure in run_suite. |
Instance Attributes
| Attribute | Type | Description |
|---|---|---|
adapter | AgentAdapter | The agent adapter instance. |
default_timeout | float | The configured default timeout. |
retries | int | The configured retry count. |
fail_fast | bool | Whether fail-fast mode is enabled. |
Methods
run_suite
def run_suite(
self,
suite: Suite,
tags: list[str] | None = None,
context: dict[str, Any] | None = None,
) -> RunResultRun all scenarios in a suite and return aggregated results.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
suite | Suite | required | The test suite to execute. |
tags | list[str] | None | None | If provided, only scenarios that have at least one matching tag will be executed. Scenarios without any overlapping tag are skipped. |
context | dict[str, Any] | None | None | Optional context dictionary passed to the adapter’s run() method for every scenario. Useful for passing session state, user IDs, or other shared data. |
Returns: RunResult — Aggregated results with compute_aggregates() already called.
Behavior details:
- Scenarios are executed sequentially in the order they appear in the suite.
- If the suite’s
SuiteConfigspecifiesretriesorfail_fast, those values override the runner’s constructor defaults for this run. - Tag filtering uses set intersection: a scenario is included if any of its tags match any of the requested tags.
- After all scenarios execute (or after a fail-fast stop),
RunResult.compute_aggregates()is called automatically to populateaggregate_score,pass_rate, andtotal_duration_ms. - The
total_duration_mson theRunResultreflects wall-clock time for the entire suite, measured independently from individual scenario durations.
Example:
from agenticassure.runner import Runner
from agenticassure.loader import load_scenarios
suite = load_scenarios("scenarios/tests.yaml")
runner = Runner(adapter=my_adapter, retries=2)
result = runner.run_suite(suite, tags=["smoke"])
print(f"Pass rate: {result.pass_rate:.0%}")
print(f"Duration: {result.total_duration_ms:.0f}ms")run_scenario
def run_scenario(
self,
scenario: Scenario,
context: dict[str, Any] | None = None,
) -> ScenarioRunResultRun a single scenario against the adapter and return the result.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
scenario | Scenario | required | The scenario to execute. |
context | dict[str, Any] | None | None | Optional context dictionary passed to the adapter. |
Returns: ScenarioRunResult — The complete result including agent output, scores, and execution metadata.
Behavior details:
- Uses the runner’s
retriessetting (from the constructor). Suite-level retry overrides only apply when usingrun_suite. - The scenario is passed to the adapter’s
run()method, and the output is then evaluated by each scorer listed inscenario.scorers. - A scenario passes only if all configured scorers pass.
- If the adapter raises an exception, the error is captured in
ScenarioRunResult.errorandpassedis set toFalse.
Example:
from agenticassure import Scenario
from agenticassure.runner import Runner
scenario = Scenario(
name="quick_test",
input="What is 2 + 2?",
expected_output="4",
scorers=["passfail", "exact"],
)
runner = Runner(adapter=my_adapter)
result = runner.run_scenario(scenario)
if result.passed:
print("Scenario passed")
else:
print(f"Scenario failed: {result.error or 'scorer(s) did not pass'}")
for score in result.scores:
print(f" {score.scorer_name}: {score.explanation}")Execution Flow
When running a scenario, the runner follows this sequence:
- Invoke the adapter — Call
adapter.run(scenario.input, context=context)to get anAgentResult. - Run scorers — For each scorer name in
scenario.scorers, look up the scorer from the registry viaget_scorer(name)and callscorer.score(scenario, agent_result). - Determine pass/fail — The scenario passes only if every scorer’s
ScoreResult.passedisTrue. If no scorers are configured, the scenario fails. - Handle errors — If the adapter or any scorer raises an exception, the error is captured and the scenario is marked as failed.
- Retry logic — If the scenario fails and retries are configured, steps 1-4 are repeated up to
retriesadditional times. The first successful attempt is returned immediately; if all attempts fail, the last error is returned.
Retry Behavior
The retry mechanism works as follows:
- A scenario is attempted up to
retries + 1times total. - On success (no exception), the result is returned immediately with
retry_countset to the zero-based attempt number. - On failure (exception), the error is recorded and the next attempt begins.
- If all attempts fail, the last error message is preserved in
ScenarioRunResult.error.
Retry priority: Suite-level config.retries takes precedence over the runner’s constructor retries value when using run_suite. When using run_scenario directly, only the constructor value applies.
Programmatic Usage Examples
Basic usage with a custom adapter
from agenticassure import AgentResult
from agenticassure.runner import Runner
from agenticassure.loader import load_scenarios
class MyAgent:
"""A simple agent adapter."""
def run(self, input: str, context=None) -> AgentResult:
# Your agent logic here
return AgentResult(output=f"Response to: {input}")
runner = Runner(adapter=MyAgent(), retries=1)
suite = load_scenarios("scenarios/tests.yaml")
result = runner.run_suite(suite)
for sr in result.scenario_results:
status = "PASS" if sr.passed else "FAIL"
print(f"[{status}] {sr.scenario.name} ({sr.duration_ms:.0f}ms)")Running with tag filters and context
runner = Runner(adapter=my_adapter)
suite = load_scenarios("scenarios/tests.yaml")
# Only run scenarios tagged "smoke" and pass shared context
result = runner.run_suite(
suite,
tags=["smoke"],
context={"user_id": "test-user-001", "session": "abc"},
)Running individual scenarios programmatically
from agenticassure import Scenario
from agenticassure.runner import Runner
scenarios = [
Scenario(name="test1", input="Hello", expected_output="hello"),
Scenario(name="test2", input="Goodbye", expected_output="goodbye"),
]
runner = Runner(adapter=my_adapter)
for scenario in scenarios:
result = runner.run_scenario(scenario)
print(f"{scenario.name}: {'PASS' if result.passed else 'FAIL'}")Fail-fast mode
runner = Runner(adapter=my_adapter, fail_fast=True)
suite = load_scenarios("scenarios/tests.yaml")
result = runner.run_suite(suite)
# Only scenarios up to (and including) the first failure were executed
print(f"Ran {len(result.scenario_results)} of {len(suite.scenarios)} scenarios")