Runner API Reference

The Runner class is the core execution engine in AgenticAssure. It takes an agent adapter, executes test scenarios against it, runs scorers on the results, and returns structured result objects.

Module: agenticassure.runner


from agenticassure.runner import Runner

Class: `Runner`


class Runner:
    def __init__(
        self,
        adapter: AgentAdapter,
        default_timeout: float = 30.0,
        retries: int = 0,
        fail_fast: bool = False,
    ) -> None

Sequential test runner that executes scenarios against an agent adapter.

Constructor Parameters

Parameter	Type	Default	Description
`adapter`	`AgentAdapter`	required	An object implementing the `AgentAdapter` protocol. Must have a `run(input, context=None) -> AgentResult` method.
`default_timeout`	`float`	`30.0`	Default timeout in seconds for scenario execution.
`retries`	`int`	`0`	Number of retry attempts per scenario on failure. `0` means no retries (each scenario is attempted exactly once).
`fail_fast`	`bool`	`False`	If `True`, stop executing remaining scenarios after the first failure in `run_suite`.

Instance Attributes

Attribute	Type	Description
`adapter`	`AgentAdapter`	The agent adapter instance.
`default_timeout`	`float`	The configured default timeout.
`retries`	`int`	The configured retry count.
`fail_fast`	`bool`	Whether fail-fast mode is enabled.

Methods

`run_suite`


def run_suite(
    self,
    suite: Suite,
    tags: list[str] | None = None,
    context: dict[str, Any] | None = None,
) -> RunResult

Run all scenarios in a suite and return aggregated results.

Parameters:

Parameter	Type	Default	Description
`suite`	`Suite`	required	The test suite to execute.
`tags`	`list[str] \| None`	`None`	If provided, only scenarios that have at least one matching tag will be executed. Scenarios without any overlapping tag are skipped.
`context`	`dict[str, Any] \| None`	`None`	Optional context dictionary passed to the adapter’s `run()` method for every scenario. Useful for passing session state, user IDs, or other shared data.

Returns: RunResult — Aggregated results with compute_aggregates() already called.

Behavior details:

Scenarios are executed sequentially in the order they appear in the suite.
If the suite’s SuiteConfig specifies retries or fail_fast, those values override the runner’s constructor defaults for this run.
Tag filtering uses set intersection: a scenario is included if any of its tags match any of the requested tags.
After all scenarios execute (or after a fail-fast stop), RunResult.compute_aggregates() is called automatically to populate aggregate_score, pass_rate, and total_duration_ms.
The total_duration_ms on the RunResult reflects wall-clock time for the entire suite, measured independently from individual scenario durations.

Example:


from agenticassure.runner import Runner
from agenticassure.loader import load_scenarios
 
suite = load_scenarios("scenarios/tests.yaml")
 
runner = Runner(adapter=my_adapter, retries=2)
result = runner.run_suite(suite, tags=["smoke"])
 
print(f"Pass rate: {result.pass_rate:.0%}")
print(f"Duration: {result.total_duration_ms:.0f}ms")

`run_scenario`


def run_scenario(
    self,
    scenario: Scenario,
    context: dict[str, Any] | None = None,
) -> ScenarioRunResult

Run a single scenario against the adapter and return the result.

Parameters:

Parameter	Type	Default	Description
`scenario`	`Scenario`	required	The scenario to execute.
`context`	`dict[str, Any] \| None`	`None`	Optional context dictionary passed to the adapter.

Returns: ScenarioRunResult — The complete result including agent output, scores, and execution metadata.

Behavior details:

Uses the runner’s retries setting (from the constructor). Suite-level retry overrides only apply when using run_suite.
The scenario is passed to the adapter’s run() method, and the output is then evaluated by each scorer listed in scenario.scorers.
A scenario passes only if all configured scorers pass.
If the adapter raises an exception, the error is captured in ScenarioRunResult.error and passed is set to False.

Example:


from agenticassure import Scenario
from agenticassure.runner import Runner
 
scenario = Scenario(
    name="quick_test",
    input="What is 2 + 2?",
    expected_output="4",
    scorers=["passfail", "exact"],
)
 
runner = Runner(adapter=my_adapter)
result = runner.run_scenario(scenario)
 
if result.passed:
    print("Scenario passed")
else:
    print(f"Scenario failed: {result.error or 'scorer(s) did not pass'}")
    for score in result.scores:
        print(f"  {score.scorer_name}: {score.explanation}")

Execution Flow

When running a scenario, the runner follows this sequence:

Invoke the adapter — Call adapter.run(scenario.input, context=context) to get an AgentResult.
Run scorers — For each scorer name in scenario.scorers, look up the scorer from the registry via get_scorer(name) and call scorer.score(scenario, agent_result).
Determine pass/fail — The scenario passes only if every scorer’s ScoreResult.passed is True. If no scorers are configured, the scenario fails.
Handle errors — If the adapter or any scorer raises an exception, the error is captured and the scenario is marked as failed.
Retry logic — If the scenario fails and retries are configured, steps 1-4 are repeated up to retries additional times. The first successful attempt is returned immediately; if all attempts fail, the last error is returned.

Retry Behavior

The retry mechanism works as follows:

A scenario is attempted up to retries + 1 times total.
On success (no exception), the result is returned immediately with retry_count set to the zero-based attempt number.
On failure (exception), the error is recorded and the next attempt begins.
If all attempts fail, the last error message is preserved in ScenarioRunResult.error.

Retry priority: Suite-level config.retries takes precedence over the runner’s constructor retries value when using run_suite. When using run_scenario directly, only the constructor value applies.

Programmatic Usage Examples

Basic usage with a custom adapter


from agenticassure import AgentResult
from agenticassure.runner import Runner
from agenticassure.loader import load_scenarios
 
 
class MyAgent:
    """A simple agent adapter."""
 
    def run(self, input: str, context=None) -> AgentResult:
        # Your agent logic here
        return AgentResult(output=f"Response to: {input}")
 
 
runner = Runner(adapter=MyAgent(), retries=1)
suite = load_scenarios("scenarios/tests.yaml")
result = runner.run_suite(suite)
 
for sr in result.scenario_results:
    status = "PASS" if sr.passed else "FAIL"
    print(f"[{status}] {sr.scenario.name} ({sr.duration_ms:.0f}ms)")

Running with tag filters and context


runner = Runner(adapter=my_adapter)
suite = load_scenarios("scenarios/tests.yaml")
 
# Only run scenarios tagged "smoke" and pass shared context
result = runner.run_suite(
    suite,
    tags=["smoke"],
    context={"user_id": "test-user-001", "session": "abc"},
)

Running individual scenarios programmatically


from agenticassure import Scenario
from agenticassure.runner import Runner
 
scenarios = [
    Scenario(name="test1", input="Hello", expected_output="hello"),
    Scenario(name="test2", input="Goodbye", expected_output="goodbye"),
]
 
runner = Runner(adapter=my_adapter)
 
for scenario in scenarios:
    result = runner.run_scenario(scenario)
    print(f"{scenario.name}: {'PASS' if result.passed else 'FAIL'}")

Fail-fast mode


runner = Runner(adapter=my_adapter, fail_fast=True)
suite = load_scenarios("scenarios/tests.yaml")
 
result = runner.run_suite(suite)
 
# Only scenarios up to (and including) the first failure were executed
print(f"Ran {len(result.scenario_results)} of {len(suite.scenarios)} scenarios")