Skip to Content
Troubleshooting

Troubleshooting

This guide covers common errors, debugging strategies, YAML pitfalls, and frequently asked questions when working with AgenticAssure.


Common Errors and Solutions

”Unknown scorer ‘X’. Available: […]”

Full error:

KeyError: "Unknown scorer 'similarity'. Available: ['passfail', 'exact', 'regex']"

Cause: The scorer name referenced in your YAML or code is not registered. This happens when:

  1. The scorer name is misspelled in your YAML file.
  2. The scorer requires an optional dependency that is not installed.
  3. A custom scorer was not registered before running tests.

Solutions:

  • Misspelling: Check the scorer name. Built-in scorer names are: passfail, exact, regex, similarity.
  • Missing dependency (similarity scorer): The similarity scorer requires sentence-transformers. Install it:
    pip install agenticassure[similarity]
  • Custom scorer not registered: Make sure your custom scorer module is imported before the runner executes. Register it with register_scorer():
    from agenticassure.scorers.base import register_scorer register_scorer(MyCustomScorer())
  • Verify available scorers:
    from agenticassure.scorers.base import list_scorers print(list_scorers())

“Additional properties are not allowed (‘X’ was unexpected)”

Full error:

ValueError: Schema validation failed for scenarios/test.yaml: Schema: scenarios.0: Additional properties are not allowed ('scorer' was unexpected)

Cause: The YAML file contains a property name that is not in the JSON Schema. The schema uses additionalProperties: false at every level, so only known fields are accepted.

Common variations:

MistakeCorrection
scorer: passfailscorers: [passfail] (must be a list, plural)
timeout: 30timeout_seconds: 30
expected: "hello"expected_output: "hello"
tool_args: {...}expected_tool_args: {...}
tools: [...]expected_tools: [...]
Any custom field at rootMove it inside metadata: {...}

Solution: Check the exact field names against the schema reference. If you need custom data, put it inside the metadata field which accepts arbitrary key-value pairs.


Schema Validation Errors

Full error:

ValueError: Schema validation failed for test.yaml: Schema: scenarios.0.timeout_seconds: 0 is not valid under any of the given schemas Schema: scenarios: [] should be non-empty

Common causes:

Error messageCauseFix
'name' is a required propertyScenario missing name fieldAdd a name to every scenario
'input' is a required propertyScenario missing input fieldAdd an input to every scenario
'scenarios' is a required propertyMissing top-level scenarios keyAdd scenarios: at the root level
[] should be non-emptyEmpty scenarios listAdd at least one scenario
X is not of type 'string'Wrong type for a fieldCheck field types (e.g., input must be a string, not a number)
X is not of type 'array'A list field has wrong typeFields like scorers, tags, expected_tools must be YAML lists
0 is not valid under...timeout_seconds is zero or negativeUse a positive value

”Could not import module ‘X’”

Full error:

Error: Could not import module 'mymodule': No module named 'mymodule' Make sure the module is installed or on your PYTHONPATH.

Cause: The adapter path provided via --adapter or config file points to a Python module that cannot be imported.

Solutions:

  1. Module not installed: Make sure the package containing your adapter is installed in the current Python environment:

    pip install -e .
  2. Wrong PYTHONPATH: If your adapter is in a local file, make sure the current working directory is on PYTHONPATH:

    # On Linux/macOS export PYTHONPATH="$PYTHONPATH:$(pwd)" # On Windows (PowerShell) $env:PYTHONPATH = "$env:PYTHONPATH;$(Get-Location)"
  3. Wrong dotted path: The adapter path must be in the format module.ClassName. For example, if your class MyAgent is in my_project/agent.py, the path is my_project.agent.MyAgent.

  4. Virtual environment not activated: Make sure you are running in the correct virtual environment where your dependencies are installed.


”does not implement the AgentAdapter protocol”

Full error:

Error: 'mymodule.MyAgent' does not implement the AgentAdapter protocol. It must have a run(input, context=None) -> AgentResult method.

Cause: Your adapter class exists and can be imported, but it does not have the correct run() method signature.

Requirements for the AgentAdapter protocol:

from agenticassure.results import AgentResult from typing import Any class MyAgent: def run(self, input: str, context: dict[str, Any] | None = None) -> AgentResult: ...

Common mistakes:

  • Method named something other than run (e.g., execute, invoke).
  • Missing the context parameter.
  • Returning a plain string instead of an AgentResult object.
  • Method is a @staticmethod or @classmethod instead of a regular method.

ImportError for sentence-transformers

Full error:

ImportError: sentence-transformers is required for SimilarityScorer. Install it with: pip install agenticassure[similarity]

Cause: The similarity scorer was referenced but the sentence-transformers package is not installed. This is an optional dependency to keep the base package lightweight.

Solution:

pip install agenticassure[similarity]

This installs sentence-transformers and its dependencies (including PyTorch). Note that this is a large dependency tree.

If you do not want to install sentence-transformers, remove similarity from your scenario scorers and use other scorers instead (e.g., exact, regex, or passfail).


”No ‘regex_pattern’ found in scenario metadata”

Full error (in ScoreResult explanation):

No 'regex_pattern' found in scenario metadata

Cause: A scenario uses the regex scorer but does not have a regex_pattern key in its metadata.

Solution: Add the pattern to your scenario’s metadata:

scenarios: - name: pattern_test input: "Generate a code" metadata: regex_pattern: "[A-Z]{3}-\\d{4}" scorers: - regex

Note the double backslash in YAML for regex escapes. See YAML Gotchas below.


Empty YAML File Errors

Full error:

ValueError: Empty YAML file: scenarios/empty.yaml

Cause: The YAML file is empty, contains only whitespace, or contains only comments.

Solution: Add at least a scenarios list with one scenario:

scenarios: - name: example input: "Hello"

YAML Parse Errors

Full error:

YAML parse error: while parsing a block mapping in "test.yaml", line 3, column 3 expected <block end>, but found '<scalar>' in "test.yaml", line 4, column 5

Cause: The YAML syntax is malformed. Common causes include:

  • Incorrect indentation (YAML uses spaces, not tabs).
  • Missing colons after keys.
  • Unquoted strings that contain special characters (:, #, {, }, [, ]).
  • Mixing indentation levels within the same block.

Solution: Validate your YAML with a linter or the built-in validate command:

agenticassure validate scenarios/test.yaml

See YAML Gotchas for common YAML pitfalls.


Timeout Errors

Symptoms: Scenarios take a long time and eventually fail, or the process hangs.

Note: The current version of AgenticAssure does not enforce timeouts at the runner level (the timeout_seconds field is available for future use and for adapters to read). Long-running scenarios will block until the adapter’s run() method returns or the underlying HTTP client times out.

Solutions:

  1. Set timeouts in your adapter or LLM client:

    import openai client = openai.OpenAI(timeout=30.0)
  2. Use shorter timeout values in your LLM provider configuration.

  3. Add retry logic via the retries setting to recover from transient timeouts:

    suite: name: tests config: retries: 2

Connection Errors to LLM APIs

Symptoms:

ConnectionError: Error communicating with the OpenAI API

or

openai.APIConnectionError: Connection error.

Solutions:

  1. Check your API key is set correctly:

    echo $OPENAI_API_KEY # Linux/macOS echo %OPENAI_API_KEY% # Windows CMD
  2. Check network connectivity — ensure you can reach the API endpoint.

  3. Check rate limits — if you are running many scenarios, you may hit rate limits. Add retries:

    agenticassure run scenarios/ --adapter mymodule.MyAgent --retry 2
  4. Use a proxy if you are behind a corporate firewall. Configure it via environment variables:

    export HTTPS_PROXY=http://proxy.example.com:8080

HF Hub Rate Limit Warnings

Symptoms:

huggingface_hub.utils._errors.HfHubHTTPError: 429 Client Error: Too Many Requests

or frequent warnings about rate limiting when using the similarity scorer.

Solution: Set the HF_TOKEN environment variable with a Hugging Face access token to get higher rate limits:

export HF_TOKEN=hf_your_token_here

You can create a token at https://huggingface.co/settings/tokens .


Debugging Tips

Use --dry-run to Validate Without Running

The --dry-run flag loads and validates your scenario files, then displays a summary table without executing any scenarios. This is useful for catching YAML errors and verifying tag filters.

agenticassure run scenarios/ --dry-run agenticassure run scenarios/ --dry-run --tag smoke

If no adapter is configured, the run command automatically falls back to dry-run behavior.


Use the validate Command

The validate command checks YAML files for structural and semantic issues:

# Validate a single file agenticassure validate scenarios/test.yaml # Validate all YAML files in a directory agenticassure validate scenarios/

Output shows OK for valid files and FAIL with specific issues for invalid ones.


Check Scorer Registration with list_scorers()

If you are unsure which scorers are available in your environment, check at runtime:

from agenticassure.scorers.base import list_scorers print(list_scorers())

Expected output with all dependencies installed:

['passfail', 'exact', 'regex', 'similarity']

If similarity is missing, install sentence-transformers:

pip install agenticassure[similarity]

Test Your Adapter Independently

Before using your adapter with AgenticAssure, test it in isolation:

from agenticassure import AgentResult from my_module import MyAgent agent = MyAgent() result = agent.run("Hello, how are you?") # Verify it returns an AgentResult assert isinstance(result, AgentResult), f"Expected AgentResult, got {type(result)}" print(f"Output: {result.output}") print(f"Tool calls: {result.tool_calls}")

Inspect Results Programmatically

For deeper debugging, use the Python API instead of the CLI:

from agenticassure.runner import Runner from agenticassure.loader import load_scenarios suite = load_scenarios("scenarios/test.yaml") runner = Runner(adapter=my_adapter) result = runner.run_suite(suite) for sr in result.scenario_results: print(f"\n--- {sr.scenario.name} ---") print(f"Passed: {sr.passed}") print(f"Duration: {sr.duration_ms:.0f}ms") print(f"Agent output: {sr.agent_result.output[:200]}") if sr.error: print(f"Error: {sr.error}") for score in sr.scores: print(f" Scorer '{score.scorer_name}': score={score.score}, passed={score.passed}") print(f" Explanation: {score.explanation}") if score.details: print(f" Details: {score.details}")

Use JSON Output for CI Integration

The JSON output format provides machine-readable results for CI pipelines:

agenticassure run scenarios/ --adapter mymodule.MyAgent --output json

This writes a results_<run_id>.json file that can be parsed by downstream tools.


YAML Gotchas

Backslash Escaping in Regex Patterns

YAML interprets backslashes in double-quoted strings. When writing regex patterns, you need to double-escape:

# WRONG -- YAML interprets \d as an escape sequence metadata: regex_pattern: "\d{3}-\d{4}" # CORRECT -- double backslash in double-quoted strings metadata: regex_pattern: "\\d{3}-\\d{4}" # ALSO CORRECT -- single-quoted strings do not process escapes metadata: regex_pattern: '\d{3}-\d{4}' # ALSO CORRECT -- unquoted (works for simple patterns) metadata: regex_pattern: \d{3}-\d{4}

For complex regex patterns, single-quoted strings are recommended as they preserve backslashes literally.


String Quoting

YAML has nuanced rules about when strings need quoting:

# These are fine unquoted input: Hello world input: What is the weather? # These NEED quoting (special characters) input: "What is the status of order #123?" # '#' starts a comment input: "key: value" # ':' followed by space is a mapping input: "Use [brackets] carefully" # '[' starts a flow sequence input: "{braces} too" # '{' starts a flow mapping input: "yes" # Without quotes, YAML reads this as boolean true input: "3.14" # Without quotes, YAML reads this as a float input: "null" # Without quotes, YAML reads this as null/None

When in doubt, use double quotes around your strings.


Indentation

YAML uses spaces for indentation (tabs are not allowed). Inconsistent indentation causes parse errors.

# CORRECT -- consistent 2-space indentation scenarios: - name: test input: "Hello" scorers: - passfail # WRONG -- tab indentation (invisible but breaks YAML) scenarios: - name: test input: "Hello" # WRONG -- inconsistent indentation scenarios: - name: test input: "Hello" # 3 spaces instead of 4 scorers: - passfail

scorers (List) vs scorer (Invalid)

A common mistake is using the singular form:

# WRONG -- "scorer" is not a recognized field scenarios: - name: test input: "Hello" scorer: passfail # CORRECT -- must be "scorers" (plural) with list syntax scenarios: - name: test input: "Hello" scorers: - passfail # ALSO CORRECT -- inline list syntax scenarios: - name: test input: "Hello" scorers: [passfail, exact]

Using scorer will produce the error: Additional properties are not allowed ('scorer' was unexpected).


Multiline Strings

YAML supports multiline strings with | (literal block) and > (folded block):

scenarios: - name: long_prompt input: | You are a helpful assistant. The user wants to know about photosynthesis. Please explain it in simple terms. expected_output: "photosynthesis"

The | preserves newlines. The > folds newlines into spaces (useful for long paragraphs).


FAQ

Can I use multiple scorers on a single scenario?

Yes. List all desired scorer names in the scorers field. A scenario passes only if all scorers pass.

scenarios: - name: multi_scored input: "What is the capital of France?" expected_output: "Paris" metadata: regex_pattern: "Paris" scorers: - passfail - exact - regex

How do I test without an LLM? (Mock adapter)

Create a simple adapter that returns canned responses:

from agenticassure import AgentResult class MockAgent: """Returns predefined responses for testing.""" def __init__(self, responses: dict[str, str] | None = None): self.responses = responses or {} self.default_response = "This is a mock response." def run(self, input: str, context=None) -> AgentResult: output = self.responses.get(input, self.default_response) return AgentResult(output=output) # Usage from agenticassure.runner import Runner from agenticassure.loader import load_scenarios agent = MockAgent(responses={ "Hello": "Hi there! How can I help?", "What is 2+2?": "4", }) runner = Runner(adapter=agent) suite = load_scenarios("scenarios/test.yaml") result = runner.run_suite(suite)

This is useful for testing your scenario definitions and scorer configurations without incurring LLM API costs.


How do I skip slow tests?

Use tags to categorize scenarios and filter them at runtime:

scenarios: - name: fast_test input: "Quick check" tags: [smoke] scorers: [passfail] - name: slow_integration input: "Complex multi-step task" tags: [integration, slow] scorers: [passfail, similarity]

Then run only the fast tests:

agenticassure run scenarios/ --adapter mymodule.MyAgent --tag smoke

Or run specific tag combinations programmatically:

result = runner.run_suite(suite, tags=["smoke"])

Can I run scenarios in parallel?

Not currently. The Runner executes scenarios sequentially. Parallel execution may be added in a future release.

If you need parallel execution now, you can split your suites into separate files and run them in parallel processes:

# Run multiple suites in parallel (bash) agenticassure run scenarios/suite1.yaml --adapter mymodule.MyAgent & agenticassure run scenarios/suite2.yaml --adapter mymodule.MyAgent & wait

Or use Python’s concurrent.futures with the programmatic API:

from concurrent.futures import ThreadPoolExecutor from agenticassure.runner import Runner from agenticassure.loader import load_scenarios_from_dir suites = load_scenarios_from_dir("scenarios/") def run_suite(suite): runner = Runner(adapter=MyAgent()) return runner.run_suite(suite) with ThreadPoolExecutor(max_workers=4) as executor: results = list(executor.map(run_suite, suites))

Note: Make sure your adapter is thread-safe if using this approach.


What Python versions are supported?

AgenticAssure requires Python 3.10 or later. It uses features introduced in Python 3.10 such as the X | Y union type syntax.


How do I pass extra context to my adapter?

Use the context parameter on run_suite or run_scenario:

result = runner.run_suite(suite, context={ "user_id": "test-user", "session_id": "abc123", "temperature": 0.0, })

Your adapter receives this context in its run() method:

class MyAgent: def run(self, input: str, context=None) -> AgentResult: user_id = context.get("user_id") if context else None # Use context in your agent logic ...

How do I use the similarity scorer with a different model?

Programmatically, create a custom SimilarityScorer instance:

from agenticassure.scorers.similarity import SimilarityScorer from agenticassure.scorers.base import register_scorer # Override with a different model custom_scorer = SimilarityScorer( model_name="all-mpnet-base-v2", threshold=0.8, ) register_scorer(custom_scorer) # Replaces the default "similarity" scorer

Per-scenario, you can override the threshold via metadata:

scenarios: - name: strict_similarity input: "Explain gravity" expected_output: "Gravity is a fundamental force..." metadata: similarity_threshold: 0.9 scorers: - similarity

Why does my scenario fail even though the output looks correct?

Check which scorers are configured and what they are checking:

  1. passfail with expected_output does a case-insensitive substring match. The expected text must appear somewhere in the output.
  2. exact compares the entire output (normalized by default). Extra text causes a mismatch.
  3. regex requires a pattern in metadata. Without it, the scorer always fails.
  4. similarity computes semantic similarity. Low scores may indicate the model’s embedding does not consider the texts similar.

Use the programmatic API to inspect individual scorer results:

for score in scenario_result.scores: print(f"{score.scorer_name}: passed={score.passed}, explanation={score.explanation}")

Can I generate reports in multiple formats at once?

The CLI supports one output format per run (--output cli|json|html). To generate multiple formats, run the command multiple times or use the Python API:

from agenticassure.reports import CLIReporter, HTMLReporter, JSONReporter # Generate all three reports from the same result CLIReporter().report(result) HTMLReporter().report(result, output_path="report.html") JSONReporter().report(result, output_path="results.json")
Last updated on