Skip to Content

Runner API Reference

The Runner class is the core execution engine in AgenticAssure. It takes an agent adapter, executes test scenarios against it, runs scorers on the results, and returns structured result objects.

Module: agenticassure.runner

from agenticassure.runner import Runner

Class: Runner

class Runner: def __init__( self, adapter: AgentAdapter, default_timeout: float = 30.0, retries: int = 0, fail_fast: bool = False, ) -> None

Sequential test runner that executes scenarios against an agent adapter.

Constructor Parameters

ParameterTypeDefaultDescription
adapterAgentAdapterrequiredAn object implementing the AgentAdapter protocol. Must have a run(input, context=None) -> AgentResult method.
default_timeoutfloat30.0Default timeout in seconds for scenario execution.
retriesint0Number of retry attempts per scenario on failure. 0 means no retries (each scenario is attempted exactly once).
fail_fastboolFalseIf True, stop executing remaining scenarios after the first failure in run_suite.

Instance Attributes

AttributeTypeDescription
adapterAgentAdapterThe agent adapter instance.
default_timeoutfloatThe configured default timeout.
retriesintThe configured retry count.
fail_fastboolWhether fail-fast mode is enabled.

Methods

run_suite

def run_suite( self, suite: Suite, tags: list[str] | None = None, context: dict[str, Any] | None = None, ) -> RunResult

Run all scenarios in a suite and return aggregated results.

Parameters:

ParameterTypeDefaultDescription
suiteSuiterequiredThe test suite to execute.
tagslist[str] | NoneNoneIf provided, only scenarios that have at least one matching tag will be executed. Scenarios without any overlapping tag are skipped.
contextdict[str, Any] | NoneNoneOptional context dictionary passed to the adapter’s run() method for every scenario. Useful for passing session state, user IDs, or other shared data.

Returns: RunResult — Aggregated results with compute_aggregates() already called.

Behavior details:

  • Scenarios are executed sequentially in the order they appear in the suite.
  • If the suite’s SuiteConfig specifies retries or fail_fast, those values override the runner’s constructor defaults for this run.
  • Tag filtering uses set intersection: a scenario is included if any of its tags match any of the requested tags.
  • After all scenarios execute (or after a fail-fast stop), RunResult.compute_aggregates() is called automatically to populate aggregate_score, pass_rate, and total_duration_ms.
  • The total_duration_ms on the RunResult reflects wall-clock time for the entire suite, measured independently from individual scenario durations.

Example:

from agenticassure.runner import Runner from agenticassure.loader import load_scenarios suite = load_scenarios("scenarios/tests.yaml") runner = Runner(adapter=my_adapter, retries=2) result = runner.run_suite(suite, tags=["smoke"]) print(f"Pass rate: {result.pass_rate:.0%}") print(f"Duration: {result.total_duration_ms:.0f}ms")

run_scenario

def run_scenario( self, scenario: Scenario, context: dict[str, Any] | None = None, ) -> ScenarioRunResult

Run a single scenario against the adapter and return the result.

Parameters:

ParameterTypeDefaultDescription
scenarioScenariorequiredThe scenario to execute.
contextdict[str, Any] | NoneNoneOptional context dictionary passed to the adapter.

Returns: ScenarioRunResult — The complete result including agent output, scores, and execution metadata.

Behavior details:

  • Uses the runner’s retries setting (from the constructor). Suite-level retry overrides only apply when using run_suite.
  • The scenario is passed to the adapter’s run() method, and the output is then evaluated by each scorer listed in scenario.scorers.
  • A scenario passes only if all configured scorers pass.
  • If the adapter raises an exception, the error is captured in ScenarioRunResult.error and passed is set to False.

Example:

from agenticassure import Scenario from agenticassure.runner import Runner scenario = Scenario( name="quick_test", input="What is 2 + 2?", expected_output="4", scorers=["passfail", "exact"], ) runner = Runner(adapter=my_adapter) result = runner.run_scenario(scenario) if result.passed: print("Scenario passed") else: print(f"Scenario failed: {result.error or 'scorer(s) did not pass'}") for score in result.scores: print(f" {score.scorer_name}: {score.explanation}")

Execution Flow

When running a scenario, the runner follows this sequence:

  1. Invoke the adapter — Call adapter.run(scenario.input, context=context) to get an AgentResult.
  2. Run scorers — For each scorer name in scenario.scorers, look up the scorer from the registry via get_scorer(name) and call scorer.score(scenario, agent_result).
  3. Determine pass/fail — The scenario passes only if every scorer’s ScoreResult.passed is True. If no scorers are configured, the scenario fails.
  4. Handle errors — If the adapter or any scorer raises an exception, the error is captured and the scenario is marked as failed.
  5. Retry logic — If the scenario fails and retries are configured, steps 1-4 are repeated up to retries additional times. The first successful attempt is returned immediately; if all attempts fail, the last error is returned.

Retry Behavior

The retry mechanism works as follows:

  • A scenario is attempted up to retries + 1 times total.
  • On success (no exception), the result is returned immediately with retry_count set to the zero-based attempt number.
  • On failure (exception), the error is recorded and the next attempt begins.
  • If all attempts fail, the last error message is preserved in ScenarioRunResult.error.

Retry priority: Suite-level config.retries takes precedence over the runner’s constructor retries value when using run_suite. When using run_scenario directly, only the constructor value applies.


Programmatic Usage Examples

Basic usage with a custom adapter

from agenticassure import AgentResult from agenticassure.runner import Runner from agenticassure.loader import load_scenarios class MyAgent: """A simple agent adapter.""" def run(self, input: str, context=None) -> AgentResult: # Your agent logic here return AgentResult(output=f"Response to: {input}") runner = Runner(adapter=MyAgent(), retries=1) suite = load_scenarios("scenarios/tests.yaml") result = runner.run_suite(suite) for sr in result.scenario_results: status = "PASS" if sr.passed else "FAIL" print(f"[{status}] {sr.scenario.name} ({sr.duration_ms:.0f}ms)")

Running with tag filters and context

runner = Runner(adapter=my_adapter) suite = load_scenarios("scenarios/tests.yaml") # Only run scenarios tagged "smoke" and pass shared context result = runner.run_suite( suite, tags=["smoke"], context={"user_id": "test-user-001", "session": "abc"}, )

Running individual scenarios programmatically

from agenticassure import Scenario from agenticassure.runner import Runner scenarios = [ Scenario(name="test1", input="Hello", expected_output="hello"), Scenario(name="test2", input="Goodbye", expected_output="goodbye"), ] runner = Runner(adapter=my_adapter) for scenario in scenarios: result = runner.run_scenario(scenario) print(f"{scenario.name}: {'PASS' if result.passed else 'FAIL'}")

Fail-fast mode

runner = Runner(adapter=my_adapter, fail_fast=True) suite = load_scenarios("scenarios/tests.yaml") result = runner.run_suite(suite) # Only scenarios up to (and including) the first failure were executed print(f"Ran {len(result.scenario_results)} of {len(suite.scenarios)} scenarios")
Last updated on