Quickstart
This guide walks you through the complete workflow: installing AgenticAssure, creating a test project, writing an agent adapter, defining test scenarios, running them, and viewing results. By the end, you will have a working test suite for an AI agent.
Prerequisites
- Python 3.10 or higher
- pip (included with Python)
- A terminal / command line
Step 1: Install AgenticAssure
Create a virtual environment and install the package:
python -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows
pip install agenticassureVerify the installation:
agenticassure --versionStep 2: Scaffold a Project
Use the init command to create a project structure with example scenarios:
mkdir my-agent-tests
cd my-agent-tests
agenticassure init .This creates:
my-agent-tests/
scenarios/
example_scenarios.yamlYou should see:
Initialized AgenticAssure project in /path/to/my-agent-tests
Created: scenarios/example_scenarios.yaml
Next steps:
1. Edit scenarios in the 'scenarios/' directory
2. Create an adapter for your agent
3. Run: agenticassure run scenarios/The generated example_scenarios.yaml contains two sample scenarios to get you started.
Step 3: Write an Agent Adapter
An adapter is a Python class that wraps your AI agent so AgenticAssure can call it. The adapter must implement a single method: run(input, context) -> AgentResult.
Create a file called my_agent.py in your project directory:
# my_agent.py
from agenticassure.results import AgentResult, ToolCall
class MyAgent:
"""Adapter that wraps our AI agent for testing."""
def run(self, input: str, context=None) -> AgentResult:
"""
This is where you call your real agent.
For this quickstart, we return a mock response.
"""
# In a real adapter, you would call your LLM/agent here:
# response = my_real_agent.invoke(input)
# return AgentResult(output=response.text, ...)
# Mock: echo the input back and simulate a tool call
if "weather" in input.lower():
return AgentResult(
output=f"The weather looks great! You asked: {input}",
tool_calls=[
ToolCall(
name="get_weather",
arguments={"location": "San Francisco"},
result="72F, sunny",
)
],
latency_ms=120.0,
)
return AgentResult(
output=f"Hello! You said: {input}",
latency_ms=50.0,
)The key points:
- Your class does not need to inherit from anything. It just needs to have a
runmethod with the correct signature. - The
runmethod receives the scenario’sinputstring and an optionalcontextdictionary. - It must return an
AgentResultobject containing at minimum anoutputstring. - You can optionally include
tool_calls,latency_ms,token_usage, andreasoning_trace.
Step 4: Write Test Scenarios
Replace the contents of scenarios/example_scenarios.yaml with scenarios tailored to your agent:
suite:
name: my-first-tests
description: Quickstart test scenarios
config:
default_timeout: 30
retries: 1
scenarios:
- name: basic_greeting
input: "Hello, how are you?"
expected_output: "hello"
scorers:
- passfail
tags:
- basic
- name: weather_tool_usage
input: "What is the weather in San Francisco?"
expected_tools:
- get_weather
expected_tool_args:
get_weather:
location: "San Francisco"
scorers:
- passfail
tags:
- tools
- weather
- name: output_contains_keyword
input: "Tell me about the weather in NYC"
expected_output: "weather"
scorers:
- regex
tags:
- basicEach scenario defines:
| Field | Required | Description |
|---|---|---|
name | Yes | A unique name for the scenario. |
input | Yes | The prompt sent to your agent. |
expected_output | No | The expected response (interpretation depends on the scorer). |
expected_tools | No | List of tool names the agent should call. |
expected_tool_args | No | Expected arguments for each tool call. |
scorers | No | List of scorers to evaluate the response. Defaults to ["passfail"]. |
tags | No | Tags for filtering and organization. |
timeout_seconds | No | Per-scenario timeout override. Defaults to 30. |
Step 5: Run Your Tests
Run all scenarios in the scenarios/ directory, specifying your adapter:
agenticassure run scenarios/ --adapter my_agent.MyAgentMake sure your current working directory contains my_agent.py so Python can import it. If your module is in a package, use the full dotted path (e.g., src.agents.my_agent.MyAgent).
You should see output like:
Loaded 3 scenario(s) from 1 suite(s)
Using adapter: my_agent.MyAgent
Suite: my-first-tests
+-----------------------+--------+-------+-----------+
| Scenario | Passed | Score | Duration |
+-----------------------+--------+-------+-----------+
| basic_greeting | PASS | 1.00 | 50.2ms |
| weather_tool_usage | PASS | 1.00 | 121.5ms |
| output_contains_keyword | PASS | 1.00 | 48.8ms |
+-----------------------+--------+-------+-----------+
Pass rate: 100.0% (3/3)The process exits with code 0 if all scenarios pass, or 1 if any fail. This makes it suitable for CI/CD pipelines.
Dry Run
To validate and list your scenarios without actually running them (no adapter needed):
agenticassure run scenarios/ --dry-runFilter by Tag
Run only scenarios with a specific tag:
agenticassure run scenarios/ --adapter my_agent.MyAgent --tag toolsYou can specify --tag multiple times to include scenarios matching any of the given tags.
Validate Without Running
Check that your YAML files are structurally correct:
agenticassure validate scenarios/OK scenarios/example_scenarios.yaml
All 1 file(s) validStep 6: View Results in Different Formats
AgenticAssure supports three output formats via the --output (or -o) flag.
CLI Output (Default)
agenticassure run scenarios/ --adapter my_agent.MyAgent --output cliPrints a formatted table to your terminal using Rich. This is the default when --output is not specified.
JSON Output
agenticassure run scenarios/ --adapter my_agent.MyAgent --output jsonWrites a structured JSON file named results_<run_id>.json containing the full test run data:
{
"run_id": "a1b2c3d4-...",
"timestamp": "2026-03-10T12:00:00Z",
"suite_name": "my-first-tests",
"scenario_results": [
{
"scenario": {
"name": "basic_greeting",
"input": "Hello, how are you?",
"expected_output": "hello"
},
"agent_result": {
"output": "Hello! You said: Hello, how are you?",
"tool_calls": [],
"latency_ms": 50.2
},
"scores": [
{
"scorer_name": "passfail",
"score": 1.0,
"passed": true,
"explanation": "..."
}
],
"passed": true,
"duration_ms": 50.2
}
],
"pass_rate": 1.0,
"aggregate_score": 1.0
}JSON output is useful for programmatic consumption, dashboards, and CI artifact storage.
HTML Output
agenticassure run scenarios/ --adapter my_agent.MyAgent --output htmlWrites a self-contained HTML file named report_<run_id>.html that you can open in any browser. The report includes:
- Suite-level summary with pass rate and aggregate scores
- Per-scenario details with scorer results
- Color-coded pass/fail indicators
- Timing information
This format is ideal for sharing results with stakeholders or archiving test run reports.
Complete Project Structure
After following this quickstart, your project looks like this:
my-agent-tests/
my_agent.py # Your agent adapter
scenarios/
example_scenarios.yaml # Test scenarios
results_<run_id>.json # Generated JSON report (if --output json)
report_<run_id>.html # Generated HTML report (if --output html)Next Steps
- Project Setup — Config files, environment variables, and organizing larger test suites.
- Scorers — Deep dive into each scoring strategy.
- Adapters — Writing adapters for OpenAI, LangChain, and custom agents.
- CLI Reference — Full documentation of all CLI commands and options.