Models API Reference

This page documents all Pydantic data models in AgenticAssure. These models represent test scenarios, agent outputs, scoring results, and run aggregations.

All models are importable from the top-level package:


from agenticassure import (
    Scenario, Suite, SuiteConfig,
    ToolCall, TokenUsage, AgentResult,
    ScoreResult, ScenarioRunResult, RunResult,
)

Model Relationship Diagram


Suite
 |-- config: SuiteConfig
 |-- scenarios: list[Scenario]

RunResult
 |-- scenario_results: list[ScenarioRunResult]
      |-- scenario: Scenario
      |-- agent_result: AgentResult
      |    |-- tool_calls: list[ToolCall]
      |    |-- token_usage: TokenUsage (optional)
      |-- scores: list[ScoreResult]

Scenario Models

`Scenario`

A single test scenario for an AI agent. Defines the input prompt, expected outputs, scoring configuration, and metadata.

Module: agenticassure.scenario

Field	Type	Default	Description
`id`	`str`	Auto-generated UUID	Unique identifier for the scenario. Generated automatically if not provided.
`name`	`str`	required	Human-readable name for the scenario.
`description`	`str \| None`	`None`	Optional longer description of what the scenario tests.
`input`	`str`	required	The prompt or input text to send to the agent.
`expected_output`	`str \| None`	`None`	The expected output text. Used by scorers like `passfail`, `exact`, and `similarity`.
`expected_tools`	`list[str] \| None`	`None`	List of tool names the agent is expected to call.
`expected_tool_args`	`dict[str, Any] \| None`	`None`	Mapping of tool name to expected arguments. Used by the `passfail` scorer to verify tool call arguments.
`tags`	`list[str]`	`[]`	Tags for filtering scenarios during runs (e.g., `["smoke", "tools"]`).
`metadata`	`dict[str, Any]`	`{}`	Arbitrary key-value pairs for scorer configuration and custom data. Used by `regex` (for `regex_pattern`), `exact` (for `exact_normalize`), and `similarity` (for `similarity_threshold`).
`scorers`	`list[str]`	`["passfail"]`	List of scorer names to evaluate this scenario. Each name must correspond to a registered scorer.
`timeout_seconds`	`float`	`30.0`	Maximum time in seconds to wait for the agent to respond.

Example:


from agenticassure import Scenario
 
# Minimal scenario
scenario = Scenario(
    name="greeting",
    input="Hello, how are you?",
)
 
# Fully specified scenario
scenario = Scenario(
    name="weather_lookup",
    description="Test that the agent calls the weather tool correctly",
    input="What is the weather in London?",
    expected_output="weather",
    expected_tools=["get_weather"],
    expected_tool_args={"get_weather": {"location": "London"}},
    tags=["tools", "weather"],
    metadata={"regex_pattern": r"\d+ degrees"},
    scorers=["passfail", "regex"],
    timeout_seconds=60.0,
)

`SuiteConfig`

Configuration settings for a test suite. These values control runner behavior when executing the suite.

Module: agenticassure.scenario

Field	Type	Default	Description
`default_timeout`	`float`	`30.0`	Default timeout in seconds for scenarios that do not specify their own.
`retries`	`int`	`0`	Number of retry attempts for each scenario on failure. `0` means no retries.
`default_scorers`	`list[str]`	`["passfail"]`	Default scorers applied to scenarios that do not specify their own.
`fail_fast`	`bool`	`False`	If `True`, stop executing remaining scenarios after the first failure.

Example:


from agenticassure import SuiteConfig
 
config = SuiteConfig(
    default_timeout=60.0,
    retries=2,
    default_scorers=["passfail", "exact"],
    fail_fast=True,
)

`Suite`

A collection of test scenarios with shared configuration.

Module: agenticassure.scenario

Field	Type	Default	Description
`name`	`str`	required	Name of the test suite.
`description`	`str \| None`	`None`	Optional description of the suite’s purpose.
`scenarios`	`list[Scenario]`	`[]`	The test scenarios in this suite.
`tags`	`list[str]`	`[]`	Suite-level tags.
`config`	`SuiteConfig`	`SuiteConfig()`	Configuration settings for this suite.

Example:


from agenticassure import Suite, Scenario, SuiteConfig
 
suite = Suite(
    name="agent-smoke-tests",
    description="Quick smoke tests for the customer support agent",
    scenarios=[
        Scenario(name="greet", input="Hello"),
        Scenario(name="farewell", input="Goodbye"),
    ],
    tags=["smoke"],
    config=SuiteConfig(retries=1, fail_fast=True),
)

Result Models

`ToolCall`

Represents a single tool call made by the agent during execution.

Module: agenticassure.results

Field	Type	Default	Description
`name`	`str`	required	The name of the tool that was called.
`arguments`	`dict[str, Any]`	`{}`	The arguments passed to the tool.
`result`	`Any \| None`	`None`	The value returned by the tool, if available.

Example:


from agenticassure import ToolCall
 
tool_call = ToolCall(
    name="get_weather",
    arguments={"location": "San Francisco", "unit": "fahrenheit"},
    result="72 degrees and sunny",
)

`TokenUsage`

Token usage statistics for an agent invocation.

Module: agenticassure.results

Field	Type	Default	Description
`prompt_tokens`	`int`	`0`	Number of tokens in the prompt (input).
`completion_tokens`	`int`	`0`	Number of tokens in the completion (output).

Properties:

Property	Type	Description
`total_tokens`	`int`	Sum of `prompt_tokens` and `completion_tokens`.

Example:


from agenticassure import TokenUsage
 
usage = TokenUsage(prompt_tokens=150, completion_tokens=80)
print(usage.total_tokens)  # 230

`AgentResult`

The structured result returned by an agent adapter after processing a scenario.

Module: agenticassure.results

Field	Type	Default	Description
`output`	`str`	required	The agent’s text output/response.
`tool_calls`	`list[ToolCall]`	`[]`	Tool calls the agent made during execution.
`reasoning_trace`	`list[str] \| None`	`None`	Optional step-by-step reasoning trace (e.g., from LangChain intermediate steps).
`latency_ms`	`float`	`0.0`	Time taken for the agent to respond, in milliseconds.
`token_usage`	`TokenUsage \| None`	`None`	Token usage statistics, if reported by the LLM provider.
`raw_response`	`Any \| None`	`None`	The raw, unprocessed response from the underlying LLM or framework.

Example:


from agenticassure import AgentResult, ToolCall, TokenUsage
 
result = AgentResult(
    output="The weather in London is 15 degrees Celsius and cloudy.",
    tool_calls=[
        ToolCall(name="get_weather", arguments={"location": "London"}),
    ],
    latency_ms=1250.5,
    token_usage=TokenUsage(prompt_tokens=45, completion_tokens=30),
)

`ScoreResult`

The result of evaluating an agent’s response with a single scorer.

Module: agenticassure.results

Field	Type	Default	Description
`scenario_id`	`str`	required	The ID of the scenario that was scored.
`scorer_name`	`str`	required	The name of the scorer that produced this result (e.g., `"passfail"`, `"exact"`).
`score`	`float`	required	Numeric score between `0.0` and `1.0` inclusive. Constrained by `ge=0.0, le=1.0`.
`passed`	`bool`	required	Whether the scenario passed according to this scorer.
`explanation`	`str`	`""`	Human-readable explanation of the scoring decision.
`details`	`dict[str, Any] \| None`	`None`	Optional structured details (e.g., regex match groups, similarity scores).

Example:


from agenticassure import ScoreResult
 
score = ScoreResult(
    scenario_id="abc-123",
    scorer_name="similarity",
    score=0.85,
    passed=True,
    explanation="Cosine similarity: 0.850 (threshold: 0.7)",
    details={"cosine_similarity": 0.85, "threshold": 0.7},
)

`ScenarioRunResult`

Complete results for a single scenario execution, including the agent’s response, all scoring results, and execution metadata.

Module: agenticassure.results

Field	Type	Default	Description
`scenario`	`Scenario`	required	The scenario that was executed.
`agent_result`	`AgentResult`	required	The agent’s response to the scenario input.
`scores`	`list[ScoreResult]`	`[]`	Results from each configured scorer.
`passed`	`bool`	`False`	`True` only if all scorers passed.
`duration_ms`	`float`	`0.0`	Wall-clock execution time in milliseconds.
`error`	`str \| None`	`None`	Error message if the scenario failed due to an exception.
`retry_count`	`int`	`0`	Number of retry attempts before arriving at this result. `0` means it succeeded on the first try.

Example:


from agenticassure import ScenarioRunResult, Scenario, AgentResult, ScoreResult
 
result = ScenarioRunResult(
    scenario=Scenario(name="test", input="Hello"),
    agent_result=AgentResult(output="Hi there!"),
    scores=[
        ScoreResult(
            scenario_id="abc",
            scorer_name="passfail",
            score=1.0,
            passed=True,
            explanation="Agent produced output",
        ),
    ],
    passed=True,
    duration_ms=450.0,
)

`RunResult`

Aggregated results for an entire test suite run. Contains all individual scenario results plus computed summary metrics.

Module: agenticassure.results

Field	Type	Default	Description
`run_id`	`str`	Auto-generated UUID	Unique identifier for this run.
`timestamp`	`datetime`	Current UTC time	When the run was initiated.
`suite_name`	`str`	required	Name of the suite that was executed.
`scenario_results`	`list[ScenarioRunResult]`	`[]`	Results for each scenario in the suite.
`aggregate_score`	`float`	`0.0`	Mean score across all scenarios (each scenario’s score is the mean of its scorer scores).
`pass_rate`	`float`	`0.0`	Fraction of scenarios that passed (0.0 to 1.0).
`total_duration_ms`	`float`	`0.0`	Total wall-clock time for the entire run in milliseconds.
`model_info`	`dict[str, Any] \| None`	`None`	Optional metadata about the model used (e.g., model name, temperature).

Methods:

`compute_aggregates() -> None`

Recomputes aggregate_score, pass_rate, and total_duration_ms from the current scenario_results. Called automatically by the Runner after executing a suite. You may call it manually if you modify scenario_results after the run.

aggregate_score is the mean of each scenario’s average scorer score.
pass_rate is the number of passed scenarios divided by total scenarios.
total_duration_ms is the sum of all scenario durations.

Example:


from agenticassure import RunResult
 
result = RunResult(suite_name="my-suite")
# After populating scenario_results:
result.compute_aggregates()
 
print(f"Pass rate: {result.pass_rate:.0%}")
print(f"Aggregate score: {result.aggregate_score:.2f}")
print(f"Total time: {result.total_duration_ms:.0f}ms")