Skip to Content
API ReferenceScorers API

Scorers API Reference

Scorers evaluate an agent’s response against a scenario’s expectations. AgenticAssure includes four built-in scorers and provides a registry system for custom scorers.

Module: agenticassure.scorers.base

from agenticassure.scorers.base import ( Scorer, # Protocol register_scorer, get_scorer, list_scorers, )

The Scorer Protocol

@runtime_checkable class Scorer(Protocol): name: str def score(self, scenario: Scenario, result: AgentResult) -> ScoreResult: ...

All scorers must satisfy this protocol. It requires:

  • A name attribute (string) — used as the lookup key in the registry.
  • A score method that takes a Scenario and an AgentResult and returns a ScoreResult.

Any class with these two members is a valid scorer. There is no need to inherit from a base class.

Example custom scorer:

from agenticassure.results import AgentResult, ScoreResult from agenticassure.scenario import Scenario from agenticassure.scorers.base import register_scorer class LengthScorer: """Passes if the agent output is at least min_length characters.""" name: str = "length" def score(self, scenario: Scenario, result: AgentResult) -> ScoreResult: min_length = scenario.metadata.get("min_length", 10) output_length = len(result.output) passed = output_length >= min_length return ScoreResult( scenario_id=scenario.id, scorer_name=self.name, score=min(1.0, output_length / min_length) if min_length > 0 else 1.0, passed=passed, explanation=f"Output length {output_length} {'>= ' if passed else '<'} {min_length}", ) # Register so it can be referenced by name in YAML register_scorer(LengthScorer())

Registry Functions

register_scorer

def register_scorer(scorer: Scorer) -> None

Register a scorer instance in the global registry under its name attribute.

Parameters:

ParameterTypeDescription
scorerScorerAn object satisfying the Scorer protocol.

Behavior:

  • If a scorer with the same name already exists, it is silently replaced. This allows overriding built-in scorers with custom implementations.
  • Registration is global and persists for the lifetime of the Python process.

Example:

from agenticassure.scorers.base import register_scorer register_scorer(LengthScorer()) register_scorer(ToneScorer())

get_scorer

def get_scorer(name: str) -> Scorer

Look up a registered scorer by name.

Parameters:

ParameterTypeDescription
namestrThe name of the scorer to retrieve.

Returns: Scorer — The registered scorer instance.

Raises:

ExceptionCondition
KeyErrorNo scorer with the given name has been registered. The error message includes the list of available scorer names: "Unknown scorer 'X'. Available: ['passfail', 'exact', ...]"

Example:

from agenticassure.scorers.base import get_scorer scorer = get_scorer("passfail") result = scorer.score(scenario, agent_result)

list_scorers

def list_scorers() -> list[str]

Return the names of all currently registered scorers.

Returns: list[str] — Scorer names in registration order.

Example:

from agenticassure.scorers.base import list_scorers print(list_scorers()) # ['passfail', 'exact', 'regex', 'similarity']

Built-in Scorers

PassFailScorer (name: "passfail")

Module: agenticassure.scorers.passfail

The default scorer. Performs multiple checks and passes only if all checks succeed.

Checks performed (in order):

  1. Non-empty output — The agent must produce non-whitespace output.
  2. Expected tools — If scenario.expected_tools is set, verifies that all listed tools were called (by name).
  3. Expected tool arguments — If scenario.expected_tool_args is set, verifies that each tool was called with the correct arguments (exact key-value match on each expected key).
  4. Expected output — If scenario.expected_output is set, verifies that the expected string appears as a case-insensitive substring of the agent’s output.

Score: 1.0 if all checks pass, 0.0 otherwise.

Explanation: A semicolon-separated list of all check results.

Example YAML:

scenarios: - name: tool_check input: "Look up order #123" expected_tools: - lookup_order expected_tool_args: lookup_order: order_id: "123" expected_output: "order status" scorers: - passfail

ExactMatchScorer (name: "exact")

Module: agenticassure.scorers.exact

Compares the agent’s output to expected_output for exact equality.

Behavior:

  • Requires scenario.expected_output to be set. Returns score 0.0 with a failure explanation if it is None.
  • By default, both strings are normalized (stripped of whitespace and lowercased) before comparison.
  • Normalization can be disabled by setting exact_normalize: false in scenario metadata.

Score: 1.0 on match, 0.0 otherwise.

Metadata options:

KeyTypeDefaultDescription
exact_normalizebooltrueWhether to strip and lowercase both strings before comparing.

Example YAML:

scenarios: - name: exact_response input: "What is the capital of France?" expected_output: "Paris" scorers: - exact - name: case_sensitive input: "Echo back: Hello World" expected_output: "Hello World" metadata: exact_normalize: false scorers: - exact

RegexScorer (name: "regex")

Module: agenticassure.scorers.regex

Tests the agent’s output against a regular expression pattern.

Behavior:

  • The regex pattern must be specified in scenario.metadata["regex_pattern"]. If missing, the scorer returns 0.0 with the explanation "No 'regex_pattern' found in scenario metadata".
  • Uses re.search (not re.match), so the pattern can match anywhere in the output.
  • If the pattern is invalid, returns 0.0 with an error explanation.

Score: 1.0 if the pattern matches, 0.0 otherwise.

Details dict (on match):

KeyTypeDescription
patternstrThe regex pattern used.
matchstr | NoneThe matched text, or None if no match.

Example YAML:

scenarios: - name: code_format input: "Generate a confirmation code" metadata: regex_pattern: "[A-Z]{3}-\\d{4}" scorers: - regex

SimilarityScorer (name: "similarity")

Module: agenticassure.scorers.similarity

Computes semantic similarity between the agent’s output and the expected output using sentence embeddings.

Requirements:

  • Requires the sentence-transformers package. Install with: pip install agenticassure[similarity]
  • If sentence-transformers is not installed, the scorer will not be registered and referencing it by name will raise a KeyError.

Behavior:

  • Requires scenario.expected_output to be set. Returns 0.0 if missing.
  • Encodes both strings using a sentence-transformer model and computes cosine similarity.
  • The model is loaded lazily on first use.
  • Default model: all-MiniLM-L6-v2
  • Default threshold: 0.7
  • The threshold can be overridden per-scenario via scenario.metadata["similarity_threshold"].

Score: The cosine similarity value, clamped to [0.0, 1.0].

Details dict:

KeyTypeDescription
cosine_similarityfloatThe raw cosine similarity value.
thresholdfloatThe threshold used for the pass/fail decision.

Constructor parameters (for programmatic use):

ParameterTypeDefaultDescription
model_namestr"all-MiniLM-L6-v2"Sentence-transformer model to use.
thresholdfloat0.7Minimum cosine similarity required to pass.

Example YAML:

scenarios: - name: semantic_check input: "Explain photosynthesis" expected_output: "Photosynthesis is the process by which plants convert sunlight into energy" metadata: similarity_threshold: 0.8 scorers: - similarity

Using Scorers Programmatically

Scoring a result manually

from agenticassure import Scenario, AgentResult from agenticassure.scorers.base import get_scorer scenario = Scenario( name="test", input="What is 2+2?", expected_output="4", ) agent_result = AgentResult(output="The answer is 4.") scorer = get_scorer("passfail") score_result = scorer.score(scenario, agent_result) print(f"Passed: {score_result.passed}") print(f"Score: {score_result.score}") print(f"Explanation: {score_result.explanation}")

Registering and using a custom scorer

from agenticassure.results import AgentResult, ScoreResult from agenticassure.scenario import Scenario from agenticassure.scorers.base import register_scorer, list_scorers class JSONOutputScorer: """Passes if the agent output is valid JSON.""" name: str = "json_valid" def score(self, scenario: Scenario, result: AgentResult) -> ScoreResult: import json try: json.loads(result.output) return ScoreResult( scenario_id=scenario.id, scorer_name=self.name, score=1.0, passed=True, explanation="Output is valid JSON", ) except json.JSONDecodeError as e: return ScoreResult( scenario_id=scenario.id, scorer_name=self.name, score=0.0, passed=False, explanation=f"Invalid JSON: {e}", ) # Register the custom scorer register_scorer(JSONOutputScorer()) # Verify it is available print(list_scorers()) # ['passfail', 'exact', 'regex', 'similarity', 'json_valid']

Then reference it in your YAML:

scenarios: - name: json_response input: "Return user data as JSON" scorers: - json_valid

Overriding a built-in scorer

from agenticassure.scorers.base import register_scorer class StrictPassFailScorer: """A stricter version of passfail that also checks output length.""" name: str = "passfail" # Same name replaces the built-in def score(self, scenario, result): # Your custom logic ... register_scorer(StrictPassFailScorer())

Checking available scorers at runtime

from agenticassure.scorers.base import list_scorers available = list_scorers() print(f"Registered scorers: {available}") # Useful for verifying optional scorers are available if "similarity" in available: print("Similarity scorer is available (sentence-transformers installed)") else: print("Similarity scorer not available -- install sentence-transformers")
Last updated on