Troubleshooting

This guide covers common errors, debugging strategies, YAML pitfalls, and frequently asked questions when working with AgenticAssure.

Common Errors and Solutions

”Unknown scorer ‘X’. Available: […]”

Full error:


KeyError: "Unknown scorer 'similarity'. Available: ['passfail', 'exact', 'regex']"

Cause: The scorer name referenced in your YAML or code is not registered. This happens when:

The scorer name is misspelled in your YAML file.
The scorer requires an optional dependency that is not installed.
A custom scorer was not registered before running tests.

Solutions:

Misspelling: Check the scorer name. Built-in scorer names are: passfail, exact, regex, similarity.
Missing dependency (similarity scorer): The similarity scorer requires sentence-transformers. Install it:
```
pip install agenticassure[similarity]
```
Custom scorer not registered: Make sure your custom scorer module is imported before the runner executes. Register it with register_scorer():
```
from agenticassure.scorers.base import register_scorer
register_scorer(MyCustomScorer())
```

Verify available scorers:


from agenticassure.scorers.base import list_scorers
print(list_scorers())

“Additional properties are not allowed (‘X’ was unexpected)”

Full error:


ValueError: Schema validation failed for scenarios/test.yaml:
Schema: scenarios.0: Additional properties are not allowed ('scorer' was unexpected)

Cause: The YAML file contains a property name that is not in the JSON Schema. The schema uses additionalProperties: false at every level, so only known fields are accepted.

Common variations:

Mistake	Correction
`scorer: passfail`	`scorers: [passfail]` (must be a list, plural)
`timeout: 30`	`timeout_seconds: 30`
`expected: "hello"`	`expected_output: "hello"`
`tool_args: {...}`	`expected_tool_args: {...}`
`tools: [...]`	`expected_tools: [...]`
Any custom field at root	Move it inside `metadata: {...}`

Solution: Check the exact field names against the schema reference. If you need custom data, put it inside the metadata field which accepts arbitrary key-value pairs.

Schema Validation Errors

Full error:


ValueError: Schema validation failed for test.yaml:
Schema: scenarios.0.timeout_seconds: 0 is not valid under any of the given schemas
Schema: scenarios: [] should be non-empty

Common causes:

Error message	Cause	Fix
`'name' is a required property`	Scenario missing `name` field	Add a `name` to every scenario
`'input' is a required property`	Scenario missing `input` field	Add an `input` to every scenario
`'scenarios' is a required property`	Missing top-level `scenarios` key	Add `scenarios:` at the root level
`[] should be non-empty`	Empty scenarios list	Add at least one scenario
`X is not of type 'string'`	Wrong type for a field	Check field types (e.g., `input` must be a string, not a number)
`X is not of type 'array'`	A list field has wrong type	Fields like `scorers`, `tags`, `expected_tools` must be YAML lists
`0 is not valid under...`	`timeout_seconds` is zero or negative	Use a positive value

”Could not import module ‘X’”

Full error:


Error: Could not import module 'mymodule': No module named 'mymodule'
Make sure the module is installed or on your PYTHONPATH.

Cause: The adapter path provided via --adapter or config file points to a Python module that cannot be imported.

Solutions:

Module not installed: Make sure the package containing your adapter is installed in the current Python environment:
```
pip install -e .
```

Wrong PYTHONPATH: If your adapter is in a local file, make sure the current working directory is on PYTHONPATH:


# On Linux/macOS
export PYTHONPATH="$PYTHONPATH:$(pwd)"
 
# On Windows (PowerShell)
$env:PYTHONPATH = "$env:PYTHONPATH;$(Get-Location)"

Wrong dotted path: The adapter path must be in the format module.ClassName. For example, if your class MyAgent is in my_project/agent.py, the path is my_project.agent.MyAgent.
Virtual environment not activated: Make sure you are running in the correct virtual environment where your dependencies are installed.

”does not implement the AgentAdapter protocol”

Full error:


Error: 'mymodule.MyAgent' does not implement the AgentAdapter protocol.
It must have a run(input, context=None) -> AgentResult method.

Cause: Your adapter class exists and can be imported, but it does not have the correct run() method signature.

Requirements for the AgentAdapter protocol:


from agenticassure.results import AgentResult
from typing import Any
 
class MyAgent:
    def run(self, input: str, context: dict[str, Any] | None = None) -> AgentResult:
        ...

Common mistakes:

Method named something other than run (e.g., execute, invoke).
Missing the context parameter.
Returning a plain string instead of an AgentResult object.
Method is a @staticmethod or @classmethod instead of a regular method.

ImportError for sentence-transformers

Full error:


ImportError: sentence-transformers is required for SimilarityScorer.
Install it with: pip install agenticassure[similarity]

Cause: The similarity scorer was referenced but the sentence-transformers package is not installed. This is an optional dependency to keep the base package lightweight.

Solution:


pip install agenticassure[similarity]

This installs sentence-transformers and its dependencies (including PyTorch). Note that this is a large dependency tree.

If you do not want to install sentence-transformers, remove similarity from your scenario scorers and use other scorers instead (e.g., exact, regex, or passfail).

”No ‘regex_pattern’ found in scenario metadata”

Full error (in ScoreResult explanation):


No 'regex_pattern' found in scenario metadata

Cause: A scenario uses the regex scorer but does not have a regex_pattern key in its metadata.

Solution: Add the pattern to your scenario’s metadata:


scenarios:
  - name: pattern_test
    input: "Generate a code"
    metadata:
      regex_pattern: "[A-Z]{3}-\\d{4}"
    scorers:
      - regex

Note the double backslash in YAML for regex escapes. See YAML Gotchas below.

Empty YAML File Errors

Full error:


ValueError: Empty YAML file: scenarios/empty.yaml

Cause: The YAML file is empty, contains only whitespace, or contains only comments.

Solution: Add at least a scenarios list with one scenario:


scenarios:
  - name: example
    input: "Hello"

YAML Parse Errors

Full error:


YAML parse error: while parsing a block mapping
  in "test.yaml", line 3, column 3
expected <block end>, but found '<scalar>'
  in "test.yaml", line 4, column 5

Cause: The YAML syntax is malformed. Common causes include:

Incorrect indentation (YAML uses spaces, not tabs).
Missing colons after keys.
Unquoted strings that contain special characters (:, #, {, }, [, ]).
Mixing indentation levels within the same block.

Solution: Validate your YAML with a linter or the built-in validate command:


agenticassure validate scenarios/test.yaml

See YAML Gotchas for common YAML pitfalls.

Timeout Errors

Symptoms: Scenarios take a long time and eventually fail, or the process hangs.

Note: The current version of AgenticAssure does not enforce timeouts at the runner level (the timeout_seconds field is available for future use and for adapters to read). Long-running scenarios will block until the adapter’s run() method returns or the underlying HTTP client times out.

Solutions:

Set timeouts in your adapter or LLM client:


import openai
client = openai.OpenAI(timeout=30.0)

Use shorter timeout values in your LLM provider configuration.
Add retry logic via the retries setting to recover from transient timeouts:
```
suite:
  name: tests
  config:
    retries: 2
```

Connection Errors to LLM APIs

Symptoms:


ConnectionError: Error communicating with the OpenAI API


openai.APIConnectionError: Connection error.

Solutions:

Check your API key is set correctly:


echo $OPENAI_API_KEY  # Linux/macOS
echo %OPENAI_API_KEY%  # Windows CMD

Check network connectivity — ensure you can reach the API endpoint.
Check rate limits — if you are running many scenarios, you may hit rate limits. Add retries:
```
agenticassure run scenarios/ --adapter mymodule.MyAgent --retry 2
```
Use a proxy if you are behind a corporate firewall. Configure it via environment variables:
```
export HTTPS_PROXY=http://proxy.example.com:8080
```

HF Hub Rate Limit Warnings

Symptoms:


huggingface_hub.utils._errors.HfHubHTTPError: 429 Client Error: Too Many Requests

or frequent warnings about rate limiting when using the similarity scorer.

Solution: Set the HF_TOKEN environment variable with a Hugging Face access token to get higher rate limits:


export HF_TOKEN=hf_your_token_here

You can create a token at https://huggingface.co/settings/tokens .

Debugging Tips

Use `--dry-run` to Validate Without Running

The --dry-run flag loads and validates your scenario files, then displays a summary table without executing any scenarios. This is useful for catching YAML errors and verifying tag filters.


agenticassure run scenarios/ --dry-run
agenticassure run scenarios/ --dry-run --tag smoke

If no adapter is configured, the run command automatically falls back to dry-run behavior.

Use the `validate` Command

The validate command checks YAML files for structural and semantic issues:


# Validate a single file
agenticassure validate scenarios/test.yaml
 
# Validate all YAML files in a directory
agenticassure validate scenarios/

Output shows OK for valid files and FAIL with specific issues for invalid ones.

Check Scorer Registration with `list_scorers()`

If you are unsure which scorers are available in your environment, check at runtime:


from agenticassure.scorers.base import list_scorers
print(list_scorers())

Expected output with all dependencies installed:


['passfail', 'exact', 'regex', 'similarity']

If similarity is missing, install sentence-transformers:


pip install agenticassure[similarity]

Test Your Adapter Independently

Before using your adapter with AgenticAssure, test it in isolation:


from agenticassure import AgentResult
from my_module import MyAgent
 
agent = MyAgent()
result = agent.run("Hello, how are you?")
 
# Verify it returns an AgentResult
assert isinstance(result, AgentResult), f"Expected AgentResult, got {type(result)}"
print(f"Output: {result.output}")
print(f"Tool calls: {result.tool_calls}")

Inspect Results Programmatically

For deeper debugging, use the Python API instead of the CLI:


from agenticassure.runner import Runner
from agenticassure.loader import load_scenarios
 
suite = load_scenarios("scenarios/test.yaml")
runner = Runner(adapter=my_adapter)
result = runner.run_suite(suite)
 
for sr in result.scenario_results:
    print(f"\n--- {sr.scenario.name} ---")
    print(f"Passed: {sr.passed}")
    print(f"Duration: {sr.duration_ms:.0f}ms")
    print(f"Agent output: {sr.agent_result.output[:200]}")
 
    if sr.error:
        print(f"Error: {sr.error}")
 
    for score in sr.scores:
        print(f"  Scorer '{score.scorer_name}': score={score.score}, passed={score.passed}")
        print(f"    Explanation: {score.explanation}")
        if score.details:
            print(f"    Details: {score.details}")

Use JSON Output for CI Integration

The JSON output format provides machine-readable results for CI pipelines:


agenticassure run scenarios/ --adapter mymodule.MyAgent --output json

This writes a results_<run_id>.json file that can be parsed by downstream tools.

YAML Gotchas

Backslash Escaping in Regex Patterns

YAML interprets backslashes in double-quoted strings. When writing regex patterns, you need to double-escape:


# WRONG -- YAML interprets \d as an escape sequence
metadata:
  regex_pattern: "\d{3}-\d{4}"
 
# CORRECT -- double backslash in double-quoted strings
metadata:
  regex_pattern: "\\d{3}-\\d{4}"
 
# ALSO CORRECT -- single-quoted strings do not process escapes
metadata:
  regex_pattern: '\d{3}-\d{4}'
 
# ALSO CORRECT -- unquoted (works for simple patterns)
metadata:
  regex_pattern: \d{3}-\d{4}

For complex regex patterns, single-quoted strings are recommended as they preserve backslashes literally.

String Quoting

YAML has nuanced rules about when strings need quoting:


# These are fine unquoted
input: Hello world
input: What is the weather?
 
# These NEED quoting (special characters)
input: "What is the status of order #123?"  # '#' starts a comment
input: "key: value"                          # ':' followed by space is a mapping
input: "Use [brackets] carefully"            # '[' starts a flow sequence
input: "{braces} too"                        # '{' starts a flow mapping
input: "yes"                                 # Without quotes, YAML reads this as boolean true
input: "3.14"                                # Without quotes, YAML reads this as a float
input: "null"                                # Without quotes, YAML reads this as null/None

When in doubt, use double quotes around your strings.

Indentation

YAML uses spaces for indentation (tabs are not allowed). Inconsistent indentation causes parse errors.


# CORRECT -- consistent 2-space indentation
scenarios:
  - name: test
    input: "Hello"
    scorers:
      - passfail
 
# WRONG -- tab indentation (invisible but breaks YAML)
scenarios:
	- name: test
	  input: "Hello"
 
# WRONG -- inconsistent indentation
scenarios:
  - name: test
   input: "Hello"     # 3 spaces instead of 4
    scorers:
      - passfail

`scorers` (List) vs `scorer` (Invalid)

A common mistake is using the singular form:


# WRONG -- "scorer" is not a recognized field
scenarios:
  - name: test
    input: "Hello"
    scorer: passfail
 
# CORRECT -- must be "scorers" (plural) with list syntax
scenarios:
  - name: test
    input: "Hello"
    scorers:
      - passfail
 
# ALSO CORRECT -- inline list syntax
scenarios:
  - name: test
    input: "Hello"
    scorers: [passfail, exact]

Using scorer will produce the error: Additional properties are not allowed ('scorer' was unexpected).

Multiline Strings

YAML supports multiline strings with | (literal block) and > (folded block):


scenarios:
  - name: long_prompt
    input: |
      You are a helpful assistant.
      The user wants to know about photosynthesis.
      Please explain it in simple terms.
    expected_output: "photosynthesis"

The | preserves newlines. The > folds newlines into spaces (useful for long paragraphs).

FAQ

Can I use multiple scorers on a single scenario?

Yes. List all desired scorer names in the scorers field. A scenario passes only if all scorers pass.


scenarios:
  - name: multi_scored
    input: "What is the capital of France?"
    expected_output: "Paris"
    metadata:
      regex_pattern: "Paris"
    scorers:
      - passfail
      - exact
      - regex

How do I test without an LLM? (Mock adapter)

Create a simple adapter that returns canned responses:


from agenticassure import AgentResult
 
 
class MockAgent:
    """Returns predefined responses for testing."""
 
    def __init__(self, responses: dict[str, str] | None = None):
        self.responses = responses or {}
        self.default_response = "This is a mock response."
 
    def run(self, input: str, context=None) -> AgentResult:
        output = self.responses.get(input, self.default_response)
        return AgentResult(output=output)
 
 
# Usage
from agenticassure.runner import Runner
from agenticassure.loader import load_scenarios
 
agent = MockAgent(responses={
    "Hello": "Hi there! How can I help?",
    "What is 2+2?": "4",
})
 
runner = Runner(adapter=agent)
suite = load_scenarios("scenarios/test.yaml")
result = runner.run_suite(suite)

This is useful for testing your scenario definitions and scorer configurations without incurring LLM API costs.

How do I skip slow tests?

Use tags to categorize scenarios and filter them at runtime:


scenarios:
  - name: fast_test
    input: "Quick check"
    tags: [smoke]
    scorers: [passfail]
 
  - name: slow_integration
    input: "Complex multi-step task"
    tags: [integration, slow]
    scorers: [passfail, similarity]

Then run only the fast tests:


agenticassure run scenarios/ --adapter mymodule.MyAgent --tag smoke

Or run specific tag combinations programmatically:


result = runner.run_suite(suite, tags=["smoke"])

Can I run scenarios in parallel?

Not currently. The Runner executes scenarios sequentially. Parallel execution may be added in a future release.

If you need parallel execution now, you can split your suites into separate files and run them in parallel processes:


# Run multiple suites in parallel (bash)
agenticassure run scenarios/suite1.yaml --adapter mymodule.MyAgent &
agenticassure run scenarios/suite2.yaml --adapter mymodule.MyAgent &
wait

Or use Python’s concurrent.futures with the programmatic API:


from concurrent.futures import ThreadPoolExecutor
from agenticassure.runner import Runner
from agenticassure.loader import load_scenarios_from_dir
 
suites = load_scenarios_from_dir("scenarios/")
 
def run_suite(suite):
    runner = Runner(adapter=MyAgent())
    return runner.run_suite(suite)
 
with ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(run_suite, suites))

Note: Make sure your adapter is thread-safe if using this approach.

What Python versions are supported?

AgenticAssure requires Python 3.10 or later. It uses features introduced in Python 3.10 such as the X | Y union type syntax.

How do I pass extra context to my adapter?

Use the context parameter on run_suite or run_scenario:


result = runner.run_suite(suite, context={
    "user_id": "test-user",
    "session_id": "abc123",
    "temperature": 0.0,
})

Your adapter receives this context in its run() method:


class MyAgent:
    def run(self, input: str, context=None) -> AgentResult:
        user_id = context.get("user_id") if context else None
        # Use context in your agent logic
        ...

How do I use the similarity scorer with a different model?

Programmatically, create a custom SimilarityScorer instance:


from agenticassure.scorers.similarity import SimilarityScorer
from agenticassure.scorers.base import register_scorer
 
# Override with a different model
custom_scorer = SimilarityScorer(
    model_name="all-mpnet-base-v2",
    threshold=0.8,
)
register_scorer(custom_scorer)  # Replaces the default "similarity" scorer

Per-scenario, you can override the threshold via metadata:


scenarios:
  - name: strict_similarity
    input: "Explain gravity"
    expected_output: "Gravity is a fundamental force..."
    metadata:
      similarity_threshold: 0.9
    scorers:
      - similarity

Why does my scenario fail even though the output looks correct?

Check which scorers are configured and what they are checking:

passfail with expected_output does a case-insensitive substring match. The expected text must appear somewhere in the output.
exact compares the entire output (normalized by default). Extra text causes a mismatch.
regex requires a pattern in metadata. Without it, the scorer always fails.
similarity computes semantic similarity. Low scores may indicate the model’s embedding does not consider the texts similar.

Use the programmatic API to inspect individual scorer results:


for score in scenario_result.scores:
    print(f"{score.scorer_name}: passed={score.passed}, explanation={score.explanation}")

Can I generate reports in multiple formats at once?

The CLI supports one output format per run (--output cli|json|html). To generate multiple formats, run the command multiple times or use the Python API:


from agenticassure.reports import CLIReporter, HTMLReporter, JSONReporter
 
# Generate all three reports from the same result
CLIReporter().report(result)
HTMLReporter().report(result, output_path="report.html")
JSONReporter().report(result, output_path="results.json")

Troubleshooting

Common Errors and Solutions

”Unknown scorer ‘X’. Available: […]”

“Additional properties are not allowed (‘X’ was unexpected)”

Schema Validation Errors

”Could not import module ‘X’”

”does not implement the AgentAdapter protocol”

ImportError for sentence-transformers

”No ‘regex_pattern’ found in scenario metadata”

Empty YAML File Errors

YAML Parse Errors

Timeout Errors

Connection Errors to LLM APIs

HF Hub Rate Limit Warnings

Debugging Tips

Use --dry-run to Validate Without Running

Use the validate Command

Check Scorer Registration with list_scorers()

Test Your Adapter Independently

Inspect Results Programmatically

Use JSON Output for CI Integration

YAML Gotchas

Backslash Escaping in Regex Patterns

String Quoting

Indentation

scorers (List) vs scorer (Invalid)

Multiline Strings

FAQ

Can I use multiple scorers on a single scenario?

How do I test without an LLM? (Mock adapter)

How do I skip slow tests?

Can I run scenarios in parallel?

What Python versions are supported?

How do I pass extra context to my adapter?

How do I use the similarity scorer with a different model?

Why does my scenario fail even though the output looks correct?

Can I generate reports in multiple formats at once?

Use `--dry-run` to Validate Without Running

Use the `validate` Command

Check Scorer Registration with `list_scorers()`

`scorers` (List) vs `scorer` (Invalid)