Models API Reference
This page documents all Pydantic data models in AgenticAssure. These models represent test scenarios, agent outputs, scoring results, and run aggregations.
All models are importable from the top-level package:
from agenticassure import (
Scenario, Suite, SuiteConfig,
ToolCall, TokenUsage, AgentResult,
ScoreResult, ScenarioRunResult, RunResult,
)Model Relationship Diagram
Suite
|-- config: SuiteConfig
|-- scenarios: list[Scenario]
RunResult
|-- scenario_results: list[ScenarioRunResult]
|-- scenario: Scenario
|-- agent_result: AgentResult
| |-- tool_calls: list[ToolCall]
| |-- token_usage: TokenUsage (optional)
|-- scores: list[ScoreResult]Scenario Models
Scenario
A single test scenario for an AI agent. Defines the input prompt, expected outputs, scoring configuration, and metadata.
Module: agenticassure.scenario
| Field | Type | Default | Description |
|---|---|---|---|
id | str | Auto-generated UUID | Unique identifier for the scenario. Generated automatically if not provided. |
name | str | required | Human-readable name for the scenario. |
description | str | None | None | Optional longer description of what the scenario tests. |
input | str | required | The prompt or input text to send to the agent. |
expected_output | str | None | None | The expected output text. Used by scorers like passfail, exact, and similarity. |
expected_tools | list[str] | None | None | List of tool names the agent is expected to call. |
expected_tool_args | dict[str, Any] | None | None | Mapping of tool name to expected arguments. Used by the passfail scorer to verify tool call arguments. |
tags | list[str] | [] | Tags for filtering scenarios during runs (e.g., ["smoke", "tools"]). |
metadata | dict[str, Any] | {} | Arbitrary key-value pairs for scorer configuration and custom data. Used by regex (for regex_pattern), exact (for exact_normalize), and similarity (for similarity_threshold). |
scorers | list[str] | ["passfail"] | List of scorer names to evaluate this scenario. Each name must correspond to a registered scorer. |
timeout_seconds | float | 30.0 | Maximum time in seconds to wait for the agent to respond. |
Example:
from agenticassure import Scenario
# Minimal scenario
scenario = Scenario(
name="greeting",
input="Hello, how are you?",
)
# Fully specified scenario
scenario = Scenario(
name="weather_lookup",
description="Test that the agent calls the weather tool correctly",
input="What is the weather in London?",
expected_output="weather",
expected_tools=["get_weather"],
expected_tool_args={"get_weather": {"location": "London"}},
tags=["tools", "weather"],
metadata={"regex_pattern": r"\d+ degrees"},
scorers=["passfail", "regex"],
timeout_seconds=60.0,
)SuiteConfig
Configuration settings for a test suite. These values control runner behavior when executing the suite.
Module: agenticassure.scenario
| Field | Type | Default | Description |
|---|---|---|---|
default_timeout | float | 30.0 | Default timeout in seconds for scenarios that do not specify their own. |
retries | int | 0 | Number of retry attempts for each scenario on failure. 0 means no retries. |
default_scorers | list[str] | ["passfail"] | Default scorers applied to scenarios that do not specify their own. |
fail_fast | bool | False | If True, stop executing remaining scenarios after the first failure. |
Example:
from agenticassure import SuiteConfig
config = SuiteConfig(
default_timeout=60.0,
retries=2,
default_scorers=["passfail", "exact"],
fail_fast=True,
)Suite
A collection of test scenarios with shared configuration.
Module: agenticassure.scenario
| Field | Type | Default | Description |
|---|---|---|---|
name | str | required | Name of the test suite. |
description | str | None | None | Optional description of the suite’s purpose. |
scenarios | list[Scenario] | [] | The test scenarios in this suite. |
tags | list[str] | [] | Suite-level tags. |
config | SuiteConfig | SuiteConfig() | Configuration settings for this suite. |
Example:
from agenticassure import Suite, Scenario, SuiteConfig
suite = Suite(
name="agent-smoke-tests",
description="Quick smoke tests for the customer support agent",
scenarios=[
Scenario(name="greet", input="Hello"),
Scenario(name="farewell", input="Goodbye"),
],
tags=["smoke"],
config=SuiteConfig(retries=1, fail_fast=True),
)Result Models
ToolCall
Represents a single tool call made by the agent during execution.
Module: agenticassure.results
| Field | Type | Default | Description |
|---|---|---|---|
name | str | required | The name of the tool that was called. |
arguments | dict[str, Any] | {} | The arguments passed to the tool. |
result | Any | None | None | The value returned by the tool, if available. |
Example:
from agenticassure import ToolCall
tool_call = ToolCall(
name="get_weather",
arguments={"location": "San Francisco", "unit": "fahrenheit"},
result="72 degrees and sunny",
)TokenUsage
Token usage statistics for an agent invocation.
Module: agenticassure.results
| Field | Type | Default | Description |
|---|---|---|---|
prompt_tokens | int | 0 | Number of tokens in the prompt (input). |
completion_tokens | int | 0 | Number of tokens in the completion (output). |
Properties:
| Property | Type | Description |
|---|---|---|
total_tokens | int | Sum of prompt_tokens and completion_tokens. |
Example:
from agenticassure import TokenUsage
usage = TokenUsage(prompt_tokens=150, completion_tokens=80)
print(usage.total_tokens) # 230AgentResult
The structured result returned by an agent adapter after processing a scenario.
Module: agenticassure.results
| Field | Type | Default | Description |
|---|---|---|---|
output | str | required | The agent’s text output/response. |
tool_calls | list[ToolCall] | [] | Tool calls the agent made during execution. |
reasoning_trace | list[str] | None | None | Optional step-by-step reasoning trace (e.g., from LangChain intermediate steps). |
latency_ms | float | 0.0 | Time taken for the agent to respond, in milliseconds. |
token_usage | TokenUsage | None | None | Token usage statistics, if reported by the LLM provider. |
raw_response | Any | None | None | The raw, unprocessed response from the underlying LLM or framework. |
Example:
from agenticassure import AgentResult, ToolCall, TokenUsage
result = AgentResult(
output="The weather in London is 15 degrees Celsius and cloudy.",
tool_calls=[
ToolCall(name="get_weather", arguments={"location": "London"}),
],
latency_ms=1250.5,
token_usage=TokenUsage(prompt_tokens=45, completion_tokens=30),
)ScoreResult
The result of evaluating an agent’s response with a single scorer.
Module: agenticassure.results
| Field | Type | Default | Description |
|---|---|---|---|
scenario_id | str | required | The ID of the scenario that was scored. |
scorer_name | str | required | The name of the scorer that produced this result (e.g., "passfail", "exact"). |
score | float | required | Numeric score between 0.0 and 1.0 inclusive. Constrained by ge=0.0, le=1.0. |
passed | bool | required | Whether the scenario passed according to this scorer. |
explanation | str | "" | Human-readable explanation of the scoring decision. |
details | dict[str, Any] | None | None | Optional structured details (e.g., regex match groups, similarity scores). |
Example:
from agenticassure import ScoreResult
score = ScoreResult(
scenario_id="abc-123",
scorer_name="similarity",
score=0.85,
passed=True,
explanation="Cosine similarity: 0.850 (threshold: 0.7)",
details={"cosine_similarity": 0.85, "threshold": 0.7},
)ScenarioRunResult
Complete results for a single scenario execution, including the agent’s response, all scoring results, and execution metadata.
Module: agenticassure.results
| Field | Type | Default | Description |
|---|---|---|---|
scenario | Scenario | required | The scenario that was executed. |
agent_result | AgentResult | required | The agent’s response to the scenario input. |
scores | list[ScoreResult] | [] | Results from each configured scorer. |
passed | bool | False | True only if all scorers passed. |
duration_ms | float | 0.0 | Wall-clock execution time in milliseconds. |
error | str | None | None | Error message if the scenario failed due to an exception. |
retry_count | int | 0 | Number of retry attempts before arriving at this result. 0 means it succeeded on the first try. |
Example:
from agenticassure import ScenarioRunResult, Scenario, AgentResult, ScoreResult
result = ScenarioRunResult(
scenario=Scenario(name="test", input="Hello"),
agent_result=AgentResult(output="Hi there!"),
scores=[
ScoreResult(
scenario_id="abc",
scorer_name="passfail",
score=1.0,
passed=True,
explanation="Agent produced output",
),
],
passed=True,
duration_ms=450.0,
)RunResult
Aggregated results for an entire test suite run. Contains all individual scenario results plus computed summary metrics.
Module: agenticassure.results
| Field | Type | Default | Description |
|---|---|---|---|
run_id | str | Auto-generated UUID | Unique identifier for this run. |
timestamp | datetime | Current UTC time | When the run was initiated. |
suite_name | str | required | Name of the suite that was executed. |
scenario_results | list[ScenarioRunResult] | [] | Results for each scenario in the suite. |
aggregate_score | float | 0.0 | Mean score across all scenarios (each scenario’s score is the mean of its scorer scores). |
pass_rate | float | 0.0 | Fraction of scenarios that passed (0.0 to 1.0). |
total_duration_ms | float | 0.0 | Total wall-clock time for the entire run in milliseconds. |
model_info | dict[str, Any] | None | None | Optional metadata about the model used (e.g., model name, temperature). |
Methods:
compute_aggregates() -> None
Recomputes aggregate_score, pass_rate, and total_duration_ms from the current scenario_results. Called automatically by the Runner after executing a suite. You may call it manually if you modify scenario_results after the run.
aggregate_scoreis the mean of each scenario’s average scorer score.pass_rateis the number of passed scenarios divided by total scenarios.total_duration_msis the sum of all scenario durations.
Example:
from agenticassure import RunResult
result = RunResult(suite_name="my-suite")
# After populating scenario_results:
result.compute_aggregates()
print(f"Pass rate: {result.pass_rate:.0%}")
print(f"Aggregate score: {result.aggregate_score:.2f}")
print(f"Total time: {result.total_duration_ms:.0f}ms")