Scorers API Reference

Scorers evaluate an agent’s response against a scenario’s expectations. AgenticAssure includes four built-in scorers and provides a registry system for custom scorers.

Module: agenticassure.scorers.base


from agenticassure.scorers.base import (
    Scorer,          # Protocol
    register_scorer,
    get_scorer,
    list_scorers,
)

The `Scorer` Protocol


@runtime_checkable
class Scorer(Protocol):
    name: str
 
    def score(self, scenario: Scenario, result: AgentResult) -> ScoreResult:
        ...

All scorers must satisfy this protocol. It requires:

A name attribute (string) — used as the lookup key in the registry.
A score method that takes a Scenario and an AgentResult and returns a ScoreResult.

Any class with these two members is a valid scorer. There is no need to inherit from a base class.

Example custom scorer:


from agenticassure.results import AgentResult, ScoreResult
from agenticassure.scenario import Scenario
from agenticassure.scorers.base import register_scorer
 
 
class LengthScorer:
    """Passes if the agent output is at least min_length characters."""
 
    name: str = "length"
 
    def score(self, scenario: Scenario, result: AgentResult) -> ScoreResult:
        min_length = scenario.metadata.get("min_length", 10)
        output_length = len(result.output)
        passed = output_length >= min_length
 
        return ScoreResult(
            scenario_id=scenario.id,
            scorer_name=self.name,
            score=min(1.0, output_length / min_length) if min_length > 0 else 1.0,
            passed=passed,
            explanation=f"Output length {output_length} {'>=  ' if passed else '<'} {min_length}",
        )
 
 
# Register so it can be referenced by name in YAML
register_scorer(LengthScorer())

Registry Functions

`register_scorer`


def register_scorer(scorer: Scorer) -> None

Parameters:

Parameter	Type	Description
`scorer`	`Scorer`	An object satisfying the `Scorer` protocol.

Behavior:

If a scorer with the same name already exists, it is silently replaced. This allows overriding built-in scorers with custom implementations.
Registration is global and persists for the lifetime of the Python process.

Example:


from agenticassure.scorers.base import register_scorer
 
register_scorer(LengthScorer())
register_scorer(ToneScorer())

`get_scorer`


def get_scorer(name: str) -> Scorer

Look up a registered scorer by name.

Parameters:

Parameter	Type	Description
`name`	`str`	The name of the scorer to retrieve.

Returns: Scorer — The registered scorer instance.

Raises:

Exception	Condition
`KeyError`	No scorer with the given name has been registered. The error message includes the list of available scorer names: `"Unknown scorer 'X'. Available: ['passfail', 'exact', ...]"`

Example:


from agenticassure.scorers.base import get_scorer
 
scorer = get_scorer("passfail")
result = scorer.score(scenario, agent_result)

`list_scorers`


def list_scorers() -> list[str]

Return the names of all currently registered scorers.

Returns: list[str] — Scorer names in registration order.

Example:


from agenticassure.scorers.base import list_scorers
 
print(list_scorers())
# ['passfail', 'exact', 'regex', 'similarity']

Built-in Scorers

`PassFailScorer` (name: `"passfail"`)

Module: agenticassure.scorers.passfail

The default scorer. Performs multiple checks and passes only if all checks succeed.

Checks performed (in order):

Non-empty output — The agent must produce non-whitespace output.
Expected tools — If scenario.expected_tools is set, verifies that all listed tools were called (by name).
Expected tool arguments — If scenario.expected_tool_args is set, verifies that each tool was called with the correct arguments (exact key-value match on each expected key).
Expected output — If scenario.expected_output is set, verifies that the expected string appears as a case-insensitive substring of the agent’s output.

Score: 1.0 if all checks pass, 0.0 otherwise.

Explanation: A semicolon-separated list of all check results.

Example YAML:


scenarios:
  - name: tool_check
    input: "Look up order #123"
    expected_tools:
      - lookup_order
    expected_tool_args:
      lookup_order:
        order_id: "123"
    expected_output: "order status"
    scorers:
      - passfail

`ExactMatchScorer` (name: `"exact"`)

Module: agenticassure.scorers.exact

Compares the agent’s output to expected_output for exact equality.

Behavior:

Requires scenario.expected_output to be set. Returns score 0.0 with a failure explanation if it is None.
By default, both strings are normalized (stripped of whitespace and lowercased) before comparison.
Normalization can be disabled by setting exact_normalize: false in scenario metadata.

Score: 1.0 on match, 0.0 otherwise.

Metadata options:

Key	Type	Default	Description
`exact_normalize`	`bool`	`true`	Whether to strip and lowercase both strings before comparing.

Example YAML:


scenarios:
  - name: exact_response
    input: "What is the capital of France?"
    expected_output: "Paris"
    scorers:
      - exact
 
  - name: case_sensitive
    input: "Echo back: Hello World"
    expected_output: "Hello World"
    metadata:
      exact_normalize: false
    scorers:
      - exact

`RegexScorer` (name: `"regex"`)

Module: agenticassure.scorers.regex

Tests the agent’s output against a regular expression pattern.

Behavior:

The regex pattern must be specified in scenario.metadata["regex_pattern"]. If missing, the scorer returns 0.0 with the explanation "No 'regex_pattern' found in scenario metadata".
Uses re.search (not re.match), so the pattern can match anywhere in the output.
If the pattern is invalid, returns 0.0 with an error explanation.

Score: 1.0 if the pattern matches, 0.0 otherwise.

Details dict (on match):

Key	Type	Description
`pattern`	`str`	The regex pattern used.
`match`	`str \| None`	The matched text, or `None` if no match.

Example YAML:


scenarios:
  - name: code_format
    input: "Generate a confirmation code"
    metadata:
      regex_pattern: "[A-Z]{3}-\\d{4}"
    scorers:
      - regex

`SimilarityScorer` (name: `"similarity"`)

Module: agenticassure.scorers.similarity

Computes semantic similarity between the agent’s output and the expected output using sentence embeddings.

Requirements:

Requires the sentence-transformers package. Install with: pip install agenticassure[similarity]
If sentence-transformers is not installed, the scorer will not be registered and referencing it by name will raise a KeyError.

Behavior:

Requires scenario.expected_output to be set. Returns 0.0 if missing.
Encodes both strings using a sentence-transformer model and computes cosine similarity.
The model is loaded lazily on first use.
Default model: all-MiniLM-L6-v2
Default threshold: 0.7
The threshold can be overridden per-scenario via scenario.metadata["similarity_threshold"].

Score: The cosine similarity value, clamped to [0.0, 1.0].

Details dict:

Key	Type	Description
`cosine_similarity`	`float`	The raw cosine similarity value.
`threshold`	`float`	The threshold used for the pass/fail decision.

Constructor parameters (for programmatic use):

Parameter	Type	Default	Description
`model_name`	`str`	`"all-MiniLM-L6-v2"`	Sentence-transformer model to use.
`threshold`	`float`	`0.7`	Minimum cosine similarity required to pass.

Example YAML:


scenarios:
  - name: semantic_check
    input: "Explain photosynthesis"
    expected_output: "Photosynthesis is the process by which plants convert sunlight into energy"
    metadata:
      similarity_threshold: 0.8
    scorers:
      - similarity

Using Scorers Programmatically

Scoring a result manually


from agenticassure import Scenario, AgentResult
from agenticassure.scorers.base import get_scorer
 
scenario = Scenario(
    name="test",
    input="What is 2+2?",
    expected_output="4",
)
 
agent_result = AgentResult(output="The answer is 4.")
 
scorer = get_scorer("passfail")
score_result = scorer.score(scenario, agent_result)
 
print(f"Passed: {score_result.passed}")
print(f"Score: {score_result.score}")
print(f"Explanation: {score_result.explanation}")

Registering and using a custom scorer


from agenticassure.results import AgentResult, ScoreResult
from agenticassure.scenario import Scenario
from agenticassure.scorers.base import register_scorer, list_scorers
 
 
class JSONOutputScorer:
    """Passes if the agent output is valid JSON."""
 
    name: str = "json_valid"
 
    def score(self, scenario: Scenario, result: AgentResult) -> ScoreResult:
        import json
        try:
            json.loads(result.output)
            return ScoreResult(
                scenario_id=scenario.id,
                scorer_name=self.name,
                score=1.0,
                passed=True,
                explanation="Output is valid JSON",
            )
        except json.JSONDecodeError as e:
            return ScoreResult(
                scenario_id=scenario.id,
                scorer_name=self.name,
                score=0.0,
                passed=False,
                explanation=f"Invalid JSON: {e}",
            )
 
 
# Register the custom scorer
register_scorer(JSONOutputScorer())
 
# Verify it is available
print(list_scorers())
# ['passfail', 'exact', 'regex', 'similarity', 'json_valid']

Then reference it in your YAML:


scenarios:
  - name: json_response
    input: "Return user data as JSON"
    scorers:
      - json_valid

Overriding a built-in scorer


from agenticassure.scorers.base import register_scorer
 
 
class StrictPassFailScorer:
    """A stricter version of passfail that also checks output length."""
 
    name: str = "passfail"  # Same name replaces the built-in
 
    def score(self, scenario, result):
        # Your custom logic
        ...
 
 
register_scorer(StrictPassFailScorer())

Checking available scorers at runtime


from agenticassure.scorers.base import list_scorers
 
available = list_scorers()
print(f"Registered scorers: {available}")
 
# Useful for verifying optional scorers are available
if "similarity" in available:
    print("Similarity scorer is available (sentence-transformers installed)")
else:
    print("Similarity scorer not available -- install sentence-transformers")

Scorers API Reference

The Scorer Protocol

Registry Functions

register_scorer

get_scorer

list_scorers

Built-in Scorers

PassFailScorer (name: "passfail")

ExactMatchScorer (name: "exact")

RegexScorer (name: "regex")

SimilarityScorer (name: "similarity")

Using Scorers Programmatically

Scoring a result manually

Registering and using a custom scorer

Overriding a built-in scorer

Checking available scorers at runtime

The `Scorer` Protocol

`register_scorer`

`get_scorer`

`list_scorers`

`PassFailScorer` (name: `"passfail"`)

`ExactMatchScorer` (name: `"exact"`)

`RegexScorer` (name: `"regex"`)

`SimilarityScorer` (name: `"similarity"`)