Quickstart

This guide walks you through the complete workflow: installing AgenticAssure, creating a test project, writing an agent adapter, defining test scenarios, running them, and viewing results. By the end, you will have a working test suite for an AI agent.

Prerequisites

Python 3.10 or higher
pip (included with Python)
A terminal / command line

Step 1: Install AgenticAssure

Create a virtual environment and install the package:


python -m venv .venv
source .venv/bin/activate   # Linux/macOS
# .venv\Scripts\activate    # Windows
 
pip install agenticassure

Verify the installation:


agenticassure --version

Step 2: Scaffold a Project

Use the init command to create a project structure with example scenarios:


mkdir my-agent-tests
cd my-agent-tests
agenticassure init .

This creates:


my-agent-tests/
  scenarios/
    example_scenarios.yaml

You should see:


Initialized AgenticAssure project in /path/to/my-agent-tests
  Created: scenarios/example_scenarios.yaml

Next steps:
  1. Edit scenarios in the 'scenarios/' directory
  2. Create an adapter for your agent
  3. Run: agenticassure run scenarios/

The generated example_scenarios.yaml contains two sample scenarios to get you started.

Step 3: Write an Agent Adapter

An adapter is a Python class that wraps your AI agent so AgenticAssure can call it. The adapter must implement a single method: run(input, context) -> AgentResult.

Create a file called my_agent.py in your project directory:


# my_agent.py
from agenticassure.results import AgentResult, ToolCall
 
 
class MyAgent:
    """Adapter that wraps our AI agent for testing."""
 
    def run(self, input: str, context=None) -> AgentResult:
        """
        This is where you call your real agent.
        For this quickstart, we return a mock response.
        """
        # In a real adapter, you would call your LLM/agent here:
        #   response = my_real_agent.invoke(input)
        #   return AgentResult(output=response.text, ...)
 
        # Mock: echo the input back and simulate a tool call
        if "weather" in input.lower():
            return AgentResult(
                output=f"The weather looks great! You asked: {input}",
                tool_calls=[
                    ToolCall(
                        name="get_weather",
                        arguments={"location": "San Francisco"},
                        result="72F, sunny",
                    )
                ],
                latency_ms=120.0,
            )
 
        return AgentResult(
            output=f"Hello! You said: {input}",
            latency_ms=50.0,
        )

The key points:

Your class does not need to inherit from anything. It just needs to have a run method with the correct signature.
The run method receives the scenario’s input string and an optional context dictionary.
It must return an AgentResult object containing at minimum an output string.
You can optionally include tool_calls, latency_ms, token_usage, and reasoning_trace.

Step 4: Write Test Scenarios

Replace the contents of scenarios/example_scenarios.yaml with scenarios tailored to your agent:


suite:
  name: my-first-tests
  description: Quickstart test scenarios
  config:
    default_timeout: 30
    retries: 1
 
scenarios:
  - name: basic_greeting
    input: "Hello, how are you?"
    expected_output: "hello"
    scorers:
      - passfail
    tags:
      - basic
 
  - name: weather_tool_usage
    input: "What is the weather in San Francisco?"
    expected_tools:
      - get_weather
    expected_tool_args:
      get_weather:
        location: "San Francisco"
    scorers:
      - passfail
    tags:
      - tools
      - weather
 
  - name: output_contains_keyword
    input: "Tell me about the weather in NYC"
    expected_output: "weather"
    scorers:
      - regex
    tags:
      - basic

Each scenario defines:

Field	Required	Description
`name`	Yes	A unique name for the scenario.
`input`	Yes	The prompt sent to your agent.
`expected_output`	No	The expected response (interpretation depends on the scorer).
`expected_tools`	No	List of tool names the agent should call.
`expected_tool_args`	No	Expected arguments for each tool call.
`scorers`	No	List of scorers to evaluate the response. Defaults to `["passfail"]`.
`tags`	No	Tags for filtering and organization.
`timeout_seconds`	No	Per-scenario timeout override. Defaults to `30`.

Step 5: Run Your Tests

Run all scenarios in the scenarios/ directory, specifying your adapter:


agenticassure run scenarios/ --adapter my_agent.MyAgent

Make sure your current working directory contains my_agent.py so Python can import it. If your module is in a package, use the full dotted path (e.g., src.agents.my_agent.MyAgent).

You should see output like:


Loaded 3 scenario(s) from 1 suite(s)
Using adapter: my_agent.MyAgent

Suite: my-first-tests
+-----------------------+--------+-------+-----------+
| Scenario              | Passed | Score | Duration  |
+-----------------------+--------+-------+-----------+
| basic_greeting        | PASS   | 1.00  | 50.2ms    |
| weather_tool_usage    | PASS   | 1.00  | 121.5ms   |
| output_contains_keyword | PASS | 1.00  | 48.8ms    |
+-----------------------+--------+-------+-----------+

Pass rate: 100.0% (3/3)

The process exits with code 0 if all scenarios pass, or 1 if any fail. This makes it suitable for CI/CD pipelines.

Dry Run

To validate and list your scenarios without actually running them (no adapter needed):


agenticassure run scenarios/ --dry-run

Filter by Tag

Run only scenarios with a specific tag:


agenticassure run scenarios/ --adapter my_agent.MyAgent --tag tools

You can specify --tag multiple times to include scenarios matching any of the given tags.

Validate Without Running

Check that your YAML files are structurally correct:


agenticassure validate scenarios/


OK scenarios/example_scenarios.yaml

All 1 file(s) valid

Step 6: View Results in Different Formats

AgenticAssure supports three output formats via the --output (or -o) flag.

CLI Output (Default)


agenticassure run scenarios/ --adapter my_agent.MyAgent --output cli

Prints a formatted table to your terminal using Rich. This is the default when --output is not specified.

JSON Output


agenticassure run scenarios/ --adapter my_agent.MyAgent --output json

Writes a structured JSON file named results_<run_id>.json containing the full test run data:


{
  "run_id": "a1b2c3d4-...",
  "timestamp": "2026-03-10T12:00:00Z",
  "suite_name": "my-first-tests",
  "scenario_results": [
    {
      "scenario": {
        "name": "basic_greeting",
        "input": "Hello, how are you?",
        "expected_output": "hello"
      },
      "agent_result": {
        "output": "Hello! You said: Hello, how are you?",
        "tool_calls": [],
        "latency_ms": 50.2
      },
      "scores": [
        {
          "scorer_name": "passfail",
          "score": 1.0,
          "passed": true,
          "explanation": "..."
        }
      ],
      "passed": true,
      "duration_ms": 50.2
    }
  ],
  "pass_rate": 1.0,
  "aggregate_score": 1.0
}

JSON output is useful for programmatic consumption, dashboards, and CI artifact storage.

HTML Output


agenticassure run scenarios/ --adapter my_agent.MyAgent --output html

Writes a self-contained HTML file named report_<run_id>.html that you can open in any browser. The report includes:

Suite-level summary with pass rate and aggregate scores
Per-scenario details with scorer results
Color-coded pass/fail indicators
Timing information

This format is ideal for sharing results with stakeholders or archiving test run reports.

Complete Project Structure

After following this quickstart, your project looks like this:


my-agent-tests/
  my_agent.py                        # Your agent adapter
  scenarios/
    example_scenarios.yaml           # Test scenarios
  results_<run_id>.json              # Generated JSON report (if --output json)
  report_<run_id>.html               # Generated HTML report (if --output html)

Next Steps

Project Setup — Config files, environment variables, and organizing larger test suites.
Scorers — Deep dive into each scoring strategy.
Adapters — Writing adapters for OpenAI, LangChain, and custom agents.
CLI Reference — Full documentation of all CLI commands and options.