Skip to Content
API ReferenceModels

Models API Reference

This page documents all Pydantic data models in AgenticAssure. These models represent test scenarios, agent outputs, scoring results, and run aggregations.

All models are importable from the top-level package:

from agenticassure import ( Scenario, Suite, SuiteConfig, ToolCall, TokenUsage, AgentResult, ScoreResult, ScenarioRunResult, RunResult, )

Model Relationship Diagram

Suite |-- config: SuiteConfig |-- scenarios: list[Scenario] RunResult |-- scenario_results: list[ScenarioRunResult] |-- scenario: Scenario |-- agent_result: AgentResult | |-- tool_calls: list[ToolCall] | |-- token_usage: TokenUsage (optional) |-- scores: list[ScoreResult]

Scenario Models

Scenario

A single test scenario for an AI agent. Defines the input prompt, expected outputs, scoring configuration, and metadata.

Module: agenticassure.scenario

FieldTypeDefaultDescription
idstrAuto-generated UUIDUnique identifier for the scenario. Generated automatically if not provided.
namestrrequiredHuman-readable name for the scenario.
descriptionstr | NoneNoneOptional longer description of what the scenario tests.
inputstrrequiredThe prompt or input text to send to the agent.
expected_outputstr | NoneNoneThe expected output text. Used by scorers like passfail, exact, and similarity.
expected_toolslist[str] | NoneNoneList of tool names the agent is expected to call.
expected_tool_argsdict[str, Any] | NoneNoneMapping of tool name to expected arguments. Used by the passfail scorer to verify tool call arguments.
tagslist[str][]Tags for filtering scenarios during runs (e.g., ["smoke", "tools"]).
metadatadict[str, Any]{}Arbitrary key-value pairs for scorer configuration and custom data. Used by regex (for regex_pattern), exact (for exact_normalize), and similarity (for similarity_threshold).
scorerslist[str]["passfail"]List of scorer names to evaluate this scenario. Each name must correspond to a registered scorer.
timeout_secondsfloat30.0Maximum time in seconds to wait for the agent to respond.

Example:

from agenticassure import Scenario # Minimal scenario scenario = Scenario( name="greeting", input="Hello, how are you?", ) # Fully specified scenario scenario = Scenario( name="weather_lookup", description="Test that the agent calls the weather tool correctly", input="What is the weather in London?", expected_output="weather", expected_tools=["get_weather"], expected_tool_args={"get_weather": {"location": "London"}}, tags=["tools", "weather"], metadata={"regex_pattern": r"\d+ degrees"}, scorers=["passfail", "regex"], timeout_seconds=60.0, )

SuiteConfig

Configuration settings for a test suite. These values control runner behavior when executing the suite.

Module: agenticassure.scenario

FieldTypeDefaultDescription
default_timeoutfloat30.0Default timeout in seconds for scenarios that do not specify their own.
retriesint0Number of retry attempts for each scenario on failure. 0 means no retries.
default_scorerslist[str]["passfail"]Default scorers applied to scenarios that do not specify their own.
fail_fastboolFalseIf True, stop executing remaining scenarios after the first failure.

Example:

from agenticassure import SuiteConfig config = SuiteConfig( default_timeout=60.0, retries=2, default_scorers=["passfail", "exact"], fail_fast=True, )

Suite

A collection of test scenarios with shared configuration.

Module: agenticassure.scenario

FieldTypeDefaultDescription
namestrrequiredName of the test suite.
descriptionstr | NoneNoneOptional description of the suite’s purpose.
scenarioslist[Scenario][]The test scenarios in this suite.
tagslist[str][]Suite-level tags.
configSuiteConfigSuiteConfig()Configuration settings for this suite.

Example:

from agenticassure import Suite, Scenario, SuiteConfig suite = Suite( name="agent-smoke-tests", description="Quick smoke tests for the customer support agent", scenarios=[ Scenario(name="greet", input="Hello"), Scenario(name="farewell", input="Goodbye"), ], tags=["smoke"], config=SuiteConfig(retries=1, fail_fast=True), )

Result Models

ToolCall

Represents a single tool call made by the agent during execution.

Module: agenticassure.results

FieldTypeDefaultDescription
namestrrequiredThe name of the tool that was called.
argumentsdict[str, Any]{}The arguments passed to the tool.
resultAny | NoneNoneThe value returned by the tool, if available.

Example:

from agenticassure import ToolCall tool_call = ToolCall( name="get_weather", arguments={"location": "San Francisco", "unit": "fahrenheit"}, result="72 degrees and sunny", )

TokenUsage

Token usage statistics for an agent invocation.

Module: agenticassure.results

FieldTypeDefaultDescription
prompt_tokensint0Number of tokens in the prompt (input).
completion_tokensint0Number of tokens in the completion (output).

Properties:

PropertyTypeDescription
total_tokensintSum of prompt_tokens and completion_tokens.

Example:

from agenticassure import TokenUsage usage = TokenUsage(prompt_tokens=150, completion_tokens=80) print(usage.total_tokens) # 230

AgentResult

The structured result returned by an agent adapter after processing a scenario.

Module: agenticassure.results

FieldTypeDefaultDescription
outputstrrequiredThe agent’s text output/response.
tool_callslist[ToolCall][]Tool calls the agent made during execution.
reasoning_tracelist[str] | NoneNoneOptional step-by-step reasoning trace (e.g., from LangChain intermediate steps).
latency_msfloat0.0Time taken for the agent to respond, in milliseconds.
token_usageTokenUsage | NoneNoneToken usage statistics, if reported by the LLM provider.
raw_responseAny | NoneNoneThe raw, unprocessed response from the underlying LLM or framework.

Example:

from agenticassure import AgentResult, ToolCall, TokenUsage result = AgentResult( output="The weather in London is 15 degrees Celsius and cloudy.", tool_calls=[ ToolCall(name="get_weather", arguments={"location": "London"}), ], latency_ms=1250.5, token_usage=TokenUsage(prompt_tokens=45, completion_tokens=30), )

ScoreResult

The result of evaluating an agent’s response with a single scorer.

Module: agenticassure.results

FieldTypeDefaultDescription
scenario_idstrrequiredThe ID of the scenario that was scored.
scorer_namestrrequiredThe name of the scorer that produced this result (e.g., "passfail", "exact").
scorefloatrequiredNumeric score between 0.0 and 1.0 inclusive. Constrained by ge=0.0, le=1.0.
passedboolrequiredWhether the scenario passed according to this scorer.
explanationstr""Human-readable explanation of the scoring decision.
detailsdict[str, Any] | NoneNoneOptional structured details (e.g., regex match groups, similarity scores).

Example:

from agenticassure import ScoreResult score = ScoreResult( scenario_id="abc-123", scorer_name="similarity", score=0.85, passed=True, explanation="Cosine similarity: 0.850 (threshold: 0.7)", details={"cosine_similarity": 0.85, "threshold": 0.7}, )

ScenarioRunResult

Complete results for a single scenario execution, including the agent’s response, all scoring results, and execution metadata.

Module: agenticassure.results

FieldTypeDefaultDescription
scenarioScenariorequiredThe scenario that was executed.
agent_resultAgentResultrequiredThe agent’s response to the scenario input.
scoreslist[ScoreResult][]Results from each configured scorer.
passedboolFalseTrue only if all scorers passed.
duration_msfloat0.0Wall-clock execution time in milliseconds.
errorstr | NoneNoneError message if the scenario failed due to an exception.
retry_countint0Number of retry attempts before arriving at this result. 0 means it succeeded on the first try.

Example:

from agenticassure import ScenarioRunResult, Scenario, AgentResult, ScoreResult result = ScenarioRunResult( scenario=Scenario(name="test", input="Hello"), agent_result=AgentResult(output="Hi there!"), scores=[ ScoreResult( scenario_id="abc", scorer_name="passfail", score=1.0, passed=True, explanation="Agent produced output", ), ], passed=True, duration_ms=450.0, )

RunResult

Aggregated results for an entire test suite run. Contains all individual scenario results plus computed summary metrics.

Module: agenticassure.results

FieldTypeDefaultDescription
run_idstrAuto-generated UUIDUnique identifier for this run.
timestampdatetimeCurrent UTC timeWhen the run was initiated.
suite_namestrrequiredName of the suite that was executed.
scenario_resultslist[ScenarioRunResult][]Results for each scenario in the suite.
aggregate_scorefloat0.0Mean score across all scenarios (each scenario’s score is the mean of its scorer scores).
pass_ratefloat0.0Fraction of scenarios that passed (0.0 to 1.0).
total_duration_msfloat0.0Total wall-clock time for the entire run in milliseconds.
model_infodict[str, Any] | NoneNoneOptional metadata about the model used (e.g., model name, temperature).

Methods:

compute_aggregates() -> None

Recomputes aggregate_score, pass_rate, and total_duration_ms from the current scenario_results. Called automatically by the Runner after executing a suite. You may call it manually if you modify scenario_results after the run.

  • aggregate_score is the mean of each scenario’s average scorer score.
  • pass_rate is the number of passed scenarios divided by total scenarios.
  • total_duration_ms is the sum of all scenario durations.

Example:

from agenticassure import RunResult result = RunResult(suite_name="my-suite") # After populating scenario_results: result.compute_aggregates() print(f"Pass rate: {result.pass_rate:.0%}") print(f"Aggregate score: {result.aggregate_score:.2f}") print(f"Total time: {result.total_duration_ms:.0f}ms")
Last updated on