Skip to Content
Getting StartedQuickstart

Quickstart

This guide walks you through the complete workflow: installing AgenticAssure, creating a test project, writing an agent adapter, defining test scenarios, running them, and viewing results. By the end, you will have a working test suite for an AI agent.

Prerequisites

  • Python 3.10 or higher
  • pip (included with Python)
  • A terminal / command line

Step 1: Install AgenticAssure

Create a virtual environment and install the package:

python -m venv .venv source .venv/bin/activate # Linux/macOS # .venv\Scripts\activate # Windows pip install agenticassure

Verify the installation:

agenticassure --version

Step 2: Scaffold a Project

Use the init command to create a project structure with example scenarios:

mkdir my-agent-tests cd my-agent-tests agenticassure init .

This creates:

my-agent-tests/ scenarios/ example_scenarios.yaml

You should see:

Initialized AgenticAssure project in /path/to/my-agent-tests Created: scenarios/example_scenarios.yaml Next steps: 1. Edit scenarios in the 'scenarios/' directory 2. Create an adapter for your agent 3. Run: agenticassure run scenarios/

The generated example_scenarios.yaml contains two sample scenarios to get you started.

Step 3: Write an Agent Adapter

An adapter is a Python class that wraps your AI agent so AgenticAssure can call it. The adapter must implement a single method: run(input, context) -> AgentResult.

Create a file called my_agent.py in your project directory:

# my_agent.py from agenticassure.results import AgentResult, ToolCall class MyAgent: """Adapter that wraps our AI agent for testing.""" def run(self, input: str, context=None) -> AgentResult: """ This is where you call your real agent. For this quickstart, we return a mock response. """ # In a real adapter, you would call your LLM/agent here: # response = my_real_agent.invoke(input) # return AgentResult(output=response.text, ...) # Mock: echo the input back and simulate a tool call if "weather" in input.lower(): return AgentResult( output=f"The weather looks great! You asked: {input}", tool_calls=[ ToolCall( name="get_weather", arguments={"location": "San Francisco"}, result="72F, sunny", ) ], latency_ms=120.0, ) return AgentResult( output=f"Hello! You said: {input}", latency_ms=50.0, )

The key points:

  • Your class does not need to inherit from anything. It just needs to have a run method with the correct signature.
  • The run method receives the scenario’s input string and an optional context dictionary.
  • It must return an AgentResult object containing at minimum an output string.
  • You can optionally include tool_calls, latency_ms, token_usage, and reasoning_trace.

Step 4: Write Test Scenarios

Replace the contents of scenarios/example_scenarios.yaml with scenarios tailored to your agent:

suite: name: my-first-tests description: Quickstart test scenarios config: default_timeout: 30 retries: 1 scenarios: - name: basic_greeting input: "Hello, how are you?" expected_output: "hello" scorers: - passfail tags: - basic - name: weather_tool_usage input: "What is the weather in San Francisco?" expected_tools: - get_weather expected_tool_args: get_weather: location: "San Francisco" scorers: - passfail tags: - tools - weather - name: output_contains_keyword input: "Tell me about the weather in NYC" expected_output: "weather" scorers: - regex tags: - basic

Each scenario defines:

FieldRequiredDescription
nameYesA unique name for the scenario.
inputYesThe prompt sent to your agent.
expected_outputNoThe expected response (interpretation depends on the scorer).
expected_toolsNoList of tool names the agent should call.
expected_tool_argsNoExpected arguments for each tool call.
scorersNoList of scorers to evaluate the response. Defaults to ["passfail"].
tagsNoTags for filtering and organization.
timeout_secondsNoPer-scenario timeout override. Defaults to 30.

Step 5: Run Your Tests

Run all scenarios in the scenarios/ directory, specifying your adapter:

agenticassure run scenarios/ --adapter my_agent.MyAgent

Make sure your current working directory contains my_agent.py so Python can import it. If your module is in a package, use the full dotted path (e.g., src.agents.my_agent.MyAgent).

You should see output like:

Loaded 3 scenario(s) from 1 suite(s) Using adapter: my_agent.MyAgent Suite: my-first-tests +-----------------------+--------+-------+-----------+ | Scenario | Passed | Score | Duration | +-----------------------+--------+-------+-----------+ | basic_greeting | PASS | 1.00 | 50.2ms | | weather_tool_usage | PASS | 1.00 | 121.5ms | | output_contains_keyword | PASS | 1.00 | 48.8ms | +-----------------------+--------+-------+-----------+ Pass rate: 100.0% (3/3)

The process exits with code 0 if all scenarios pass, or 1 if any fail. This makes it suitable for CI/CD pipelines.

Dry Run

To validate and list your scenarios without actually running them (no adapter needed):

agenticassure run scenarios/ --dry-run

Filter by Tag

Run only scenarios with a specific tag:

agenticassure run scenarios/ --adapter my_agent.MyAgent --tag tools

You can specify --tag multiple times to include scenarios matching any of the given tags.

Validate Without Running

Check that your YAML files are structurally correct:

agenticassure validate scenarios/
OK scenarios/example_scenarios.yaml All 1 file(s) valid

Step 6: View Results in Different Formats

AgenticAssure supports three output formats via the --output (or -o) flag.

CLI Output (Default)

agenticassure run scenarios/ --adapter my_agent.MyAgent --output cli

Prints a formatted table to your terminal using Rich. This is the default when --output is not specified.

JSON Output

agenticassure run scenarios/ --adapter my_agent.MyAgent --output json

Writes a structured JSON file named results_<run_id>.json containing the full test run data:

{ "run_id": "a1b2c3d4-...", "timestamp": "2026-03-10T12:00:00Z", "suite_name": "my-first-tests", "scenario_results": [ { "scenario": { "name": "basic_greeting", "input": "Hello, how are you?", "expected_output": "hello" }, "agent_result": { "output": "Hello! You said: Hello, how are you?", "tool_calls": [], "latency_ms": 50.2 }, "scores": [ { "scorer_name": "passfail", "score": 1.0, "passed": true, "explanation": "..." } ], "passed": true, "duration_ms": 50.2 } ], "pass_rate": 1.0, "aggregate_score": 1.0 }

JSON output is useful for programmatic consumption, dashboards, and CI artifact storage.

HTML Output

agenticassure run scenarios/ --adapter my_agent.MyAgent --output html

Writes a self-contained HTML file named report_<run_id>.html that you can open in any browser. The report includes:

  • Suite-level summary with pass rate and aggregate scores
  • Per-scenario details with scorer results
  • Color-coded pass/fail indicators
  • Timing information

This format is ideal for sharing results with stakeholders or archiving test run reports.

Complete Project Structure

After following this quickstart, your project looks like this:

my-agent-tests/ my_agent.py # Your agent adapter scenarios/ example_scenarios.yaml # Test scenarios results_<run_id>.json # Generated JSON report (if --output json) report_<run_id>.html # Generated HTML report (if --output html)

Next Steps

  • Project Setup — Config files, environment variables, and organizing larger test suites.
  • Scorers — Deep dive into each scoring strategy.
  • Adapters — Writing adapters for OpenAI, LangChain, and custom agents.
  • CLI Reference — Full documentation of all CLI commands and options.
Last updated on