Scorers API Reference
Scorers evaluate an agent’s response against a scenario’s expectations. AgenticAssure includes four built-in scorers and provides a registry system for custom scorers.
Module: agenticassure.scorers.base
from agenticassure.scorers.base import (
Scorer, # Protocol
register_scorer,
get_scorer,
list_scorers,
)The Scorer Protocol
@runtime_checkable
class Scorer(Protocol):
name: str
def score(self, scenario: Scenario, result: AgentResult) -> ScoreResult:
...All scorers must satisfy this protocol. It requires:
- A
nameattribute (string) — used as the lookup key in the registry. - A
scoremethod that takes aScenarioand anAgentResultand returns aScoreResult.
Any class with these two members is a valid scorer. There is no need to inherit from a base class.
Example custom scorer:
from agenticassure.results import AgentResult, ScoreResult
from agenticassure.scenario import Scenario
from agenticassure.scorers.base import register_scorer
class LengthScorer:
"""Passes if the agent output is at least min_length characters."""
name: str = "length"
def score(self, scenario: Scenario, result: AgentResult) -> ScoreResult:
min_length = scenario.metadata.get("min_length", 10)
output_length = len(result.output)
passed = output_length >= min_length
return ScoreResult(
scenario_id=scenario.id,
scorer_name=self.name,
score=min(1.0, output_length / min_length) if min_length > 0 else 1.0,
passed=passed,
explanation=f"Output length {output_length} {'>= ' if passed else '<'} {min_length}",
)
# Register so it can be referenced by name in YAML
register_scorer(LengthScorer())Registry Functions
register_scorer
def register_scorer(scorer: Scorer) -> NoneRegister a scorer instance in the global registry under its name attribute.
Parameters:
| Parameter | Type | Description |
|---|---|---|
scorer | Scorer | An object satisfying the Scorer protocol. |
Behavior:
- If a scorer with the same name already exists, it is silently replaced. This allows overriding built-in scorers with custom implementations.
- Registration is global and persists for the lifetime of the Python process.
Example:
from agenticassure.scorers.base import register_scorer
register_scorer(LengthScorer())
register_scorer(ToneScorer())get_scorer
def get_scorer(name: str) -> ScorerLook up a registered scorer by name.
Parameters:
| Parameter | Type | Description |
|---|---|---|
name | str | The name of the scorer to retrieve. |
Returns: Scorer — The registered scorer instance.
Raises:
| Exception | Condition |
|---|---|
KeyError | No scorer with the given name has been registered. The error message includes the list of available scorer names: "Unknown scorer 'X'. Available: ['passfail', 'exact', ...]" |
Example:
from agenticassure.scorers.base import get_scorer
scorer = get_scorer("passfail")
result = scorer.score(scenario, agent_result)list_scorers
def list_scorers() -> list[str]Return the names of all currently registered scorers.
Returns: list[str] — Scorer names in registration order.
Example:
from agenticassure.scorers.base import list_scorers
print(list_scorers())
# ['passfail', 'exact', 'regex', 'similarity']Built-in Scorers
PassFailScorer (name: "passfail")
Module: agenticassure.scorers.passfail
The default scorer. Performs multiple checks and passes only if all checks succeed.
Checks performed (in order):
- Non-empty output — The agent must produce non-whitespace output.
- Expected tools — If
scenario.expected_toolsis set, verifies that all listed tools were called (by name). - Expected tool arguments — If
scenario.expected_tool_argsis set, verifies that each tool was called with the correct arguments (exact key-value match on each expected key). - Expected output — If
scenario.expected_outputis set, verifies that the expected string appears as a case-insensitive substring of the agent’s output.
Score: 1.0 if all checks pass, 0.0 otherwise.
Explanation: A semicolon-separated list of all check results.
Example YAML:
scenarios:
- name: tool_check
input: "Look up order #123"
expected_tools:
- lookup_order
expected_tool_args:
lookup_order:
order_id: "123"
expected_output: "order status"
scorers:
- passfailExactMatchScorer (name: "exact")
Module: agenticassure.scorers.exact
Compares the agent’s output to expected_output for exact equality.
Behavior:
- Requires
scenario.expected_outputto be set. Returns score0.0with a failure explanation if it isNone. - By default, both strings are normalized (stripped of whitespace and lowercased) before comparison.
- Normalization can be disabled by setting
exact_normalize: falsein scenario metadata.
Score: 1.0 on match, 0.0 otherwise.
Metadata options:
| Key | Type | Default | Description |
|---|---|---|---|
exact_normalize | bool | true | Whether to strip and lowercase both strings before comparing. |
Example YAML:
scenarios:
- name: exact_response
input: "What is the capital of France?"
expected_output: "Paris"
scorers:
- exact
- name: case_sensitive
input: "Echo back: Hello World"
expected_output: "Hello World"
metadata:
exact_normalize: false
scorers:
- exactRegexScorer (name: "regex")
Module: agenticassure.scorers.regex
Tests the agent’s output against a regular expression pattern.
Behavior:
- The regex pattern must be specified in
scenario.metadata["regex_pattern"]. If missing, the scorer returns0.0with the explanation"No 'regex_pattern' found in scenario metadata". - Uses
re.search(notre.match), so the pattern can match anywhere in the output. - If the pattern is invalid, returns
0.0with an error explanation.
Score: 1.0 if the pattern matches, 0.0 otherwise.
Details dict (on match):
| Key | Type | Description |
|---|---|---|
pattern | str | The regex pattern used. |
match | str | None | The matched text, or None if no match. |
Example YAML:
scenarios:
- name: code_format
input: "Generate a confirmation code"
metadata:
regex_pattern: "[A-Z]{3}-\\d{4}"
scorers:
- regexSimilarityScorer (name: "similarity")
Module: agenticassure.scorers.similarity
Computes semantic similarity between the agent’s output and the expected output using sentence embeddings.
Requirements:
- Requires the
sentence-transformerspackage. Install with:pip install agenticassure[similarity] - If
sentence-transformersis not installed, the scorer will not be registered and referencing it by name will raise aKeyError.
Behavior:
- Requires
scenario.expected_outputto be set. Returns0.0if missing. - Encodes both strings using a sentence-transformer model and computes cosine similarity.
- The model is loaded lazily on first use.
- Default model:
all-MiniLM-L6-v2 - Default threshold:
0.7 - The threshold can be overridden per-scenario via
scenario.metadata["similarity_threshold"].
Score: The cosine similarity value, clamped to [0.0, 1.0].
Details dict:
| Key | Type | Description |
|---|---|---|
cosine_similarity | float | The raw cosine similarity value. |
threshold | float | The threshold used for the pass/fail decision. |
Constructor parameters (for programmatic use):
| Parameter | Type | Default | Description |
|---|---|---|---|
model_name | str | "all-MiniLM-L6-v2" | Sentence-transformer model to use. |
threshold | float | 0.7 | Minimum cosine similarity required to pass. |
Example YAML:
scenarios:
- name: semantic_check
input: "Explain photosynthesis"
expected_output: "Photosynthesis is the process by which plants convert sunlight into energy"
metadata:
similarity_threshold: 0.8
scorers:
- similarityUsing Scorers Programmatically
Scoring a result manually
from agenticassure import Scenario, AgentResult
from agenticassure.scorers.base import get_scorer
scenario = Scenario(
name="test",
input="What is 2+2?",
expected_output="4",
)
agent_result = AgentResult(output="The answer is 4.")
scorer = get_scorer("passfail")
score_result = scorer.score(scenario, agent_result)
print(f"Passed: {score_result.passed}")
print(f"Score: {score_result.score}")
print(f"Explanation: {score_result.explanation}")Registering and using a custom scorer
from agenticassure.results import AgentResult, ScoreResult
from agenticassure.scenario import Scenario
from agenticassure.scorers.base import register_scorer, list_scorers
class JSONOutputScorer:
"""Passes if the agent output is valid JSON."""
name: str = "json_valid"
def score(self, scenario: Scenario, result: AgentResult) -> ScoreResult:
import json
try:
json.loads(result.output)
return ScoreResult(
scenario_id=scenario.id,
scorer_name=self.name,
score=1.0,
passed=True,
explanation="Output is valid JSON",
)
except json.JSONDecodeError as e:
return ScoreResult(
scenario_id=scenario.id,
scorer_name=self.name,
score=0.0,
passed=False,
explanation=f"Invalid JSON: {e}",
)
# Register the custom scorer
register_scorer(JSONOutputScorer())
# Verify it is available
print(list_scorers())
# ['passfail', 'exact', 'regex', 'similarity', 'json_valid']Then reference it in your YAML:
scenarios:
- name: json_response
input: "Return user data as JSON"
scorers:
- json_validOverriding a built-in scorer
from agenticassure.scorers.base import register_scorer
class StrictPassFailScorer:
"""A stricter version of passfail that also checks output length."""
name: str = "passfail" # Same name replaces the built-in
def score(self, scenario, result):
# Your custom logic
...
register_scorer(StrictPassFailScorer())Checking available scorers at runtime
from agenticassure.scorers.base import list_scorers
available = list_scorers()
print(f"Registered scorers: {available}")
# Useful for verifying optional scorers are available
if "similarity" in available:
print("Similarity scorer is available (sentence-transformers installed)")
else:
print("Similarity scorer not available -- install sentence-transformers")