JSON Report

The JSON report outputs the complete test results as a structured JSON file. It contains every field available in the result models, making it the most detailed output format and the best choice for programmatic consumption, archival, and integration with external tools.

Generating a JSON Report

Use the --output json flag with the run command:


agenticassure run scenarios/ --adapter my_agent.MyAgent --output json

After the run completes, AgenticAssure writes the report and prints the filename:


JSON report written to results_a1b2c3d4-e5f6-7890-abcd-ef1234567890.json

File Naming

JSON reports are named using the pattern:


results_{run_id}.json

The run_id is a UUID generated for each run. Files are written to the current working directory.

JSON Structure

The JSON report is a serialization of the RunResult Pydantic model. Below is the full schema with descriptions of every field.

Top-Level Object


{
  "run_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "timestamp": "2026-03-10T14:30:00+00:00",
  "suite_name": "search-agent-tests",
  "scenario_results": [ ... ],
  "aggregate_score": 0.85,
  "pass_rate": 0.75,
  "total_duration_ms": 1245.7,
  "model_info": null
}

Field	Type	Description
`run_id`	string	UUID identifying this run
`timestamp`	string (ISO 8601)	UTC timestamp of when the run started
`suite_name`	string	Name of the test suite
`scenario_results`	array	List of `ScenarioRunResult` objects (see below)
`aggregate_score`	float	Average score across all scenarios (0.0 to 1.0)
`pass_rate`	float	Fraction of scenarios that passed (0.0 to 1.0)
`total_duration_ms`	float	Total wall-clock duration in milliseconds
`model_info`	object or null	Optional metadata about the model used

ScenarioRunResult

Each element of scenario_results has this structure:


{
  "scenario": { ... },
  "agent_result": { ... },
  "scores": [ ... ],
  "passed": true,
  "duration_ms": 245.3,
  "error": null,
  "retry_count": 0
}

Field	Type	Description
`scenario`	object	The original scenario definition (see below)
`agent_result`	object	The agent’s response (see below)
`scores`	array	List of `ScoreResult` objects from each scorer
`passed`	boolean	Whether the scenario passed overall
`duration_ms`	float	Execution time in milliseconds
`error`	string or null	Error message if an exception occurred, otherwise null
`retry_count`	integer	Number of retries attempted

Scenario

The scenario object as defined in the YAML file:


{
  "name": "weather_query",
  "input": "What is the weather in San Francisco?",
  "expected_output": null,
  "expected_tools": ["get_weather"],
  "expected_tool_args": {
    "get_weather": { "location": "San Francisco" }
  },
  "scorers": ["passfail"],
  "tags": ["tools", "weather"],
  "context": null,
  "metadata": null
}

AgentResult

The result returned by your adapter’s run() method:


{
  "output": "The weather in San Francisco is 65F and sunny.",
  "tool_calls": [
    {
      "name": "get_weather",
      "arguments": { "location": "San Francisco" },
      "result": { "temp": 65, "condition": "sunny" }
    }
  ],
  "reasoning_trace": null,
  "latency_ms": 230.5,
  "token_usage": {
    "prompt_tokens": 150,
    "completion_tokens": 45
  },
  "raw_response": null
}

Field	Type	Description
`output`	string	The agent’s text response
`tool_calls`	array	List of tool calls made by the agent
`reasoning_trace`	array or null	Optional list of reasoning steps
`latency_ms`	float	Agent-reported latency in milliseconds
`token_usage`	object or null	Token usage breakdown (prompt and completion)
`raw_response`	any or null	Optional raw response from the underlying API

ToolCall

Each entry in tool_calls:

Field	Type	Description
`name`	string	Name of the tool that was called
`arguments`	object	Arguments passed to the tool
`result`	any or null	The result returned by the tool

ScoreResult

Each entry in scores:


{
  "scenario_id": "weather_query",
  "scorer_name": "passfail",
  "score": 1.0,
  "passed": true,
  "explanation": "Output contains expected text; Tool 'get_weather' called correctly",
  "details": null
}

Field	Type	Description
`scenario_id`	string	Name of the scenario being scored
`scorer_name`	string	Name of the scorer that produced this result
`score`	float	Score from 0.0 to 1.0
`passed`	boolean	Whether this scorer considers the scenario passed
`explanation`	string	Human-readable explanation of the scoring decision
`details`	object or null	Optional additional structured details

TokenUsage

Field	Type	Description
`prompt_tokens`	integer	Number of tokens in the prompt
`completion_tokens`	integer	Number of tokens in the completion

Consuming the JSON Report Programmatically

Python


import json
from pathlib import Path
 
report = json.loads(Path("results_a1b2c3d4.json").read_text())
 
print(f"Pass rate: {report['pass_rate']:.0%}")
print(f"Avg score: {report['aggregate_score']:.2f}")
 
for sr in report["scenario_results"]:
    status = "PASS" if sr["passed"] else "FAIL"
    print(f"  {sr['scenario']['name']}: {status}")

jq (command line)


# Get pass rate
jq '.pass_rate' results_a1b2c3d4.json
 
# List failed scenarios
jq '.scenario_results[] | select(.passed == false) | .scenario.name' results_a1b2c3d4.json
 
# Get total token usage across all scenarios
jq '[.scenario_results[].agent_result.token_usage // {prompt_tokens:0, completion_tokens:0} | .prompt_tokens + .completion_tokens] | add' results_a1b2c3d4.json
 
# Extract all tool calls
jq '[.scenario_results[].agent_result.tool_calls[].name] | unique' results_a1b2c3d4.json

JavaScript / Node.js


const fs = require("fs");
const report = JSON.parse(fs.readFileSync("results_a1b2c3d4.json", "utf-8"));
 
const failed = report.scenario_results.filter((sr) => !sr.passed);
console.log(`${failed.length} scenario(s) failed`);

Integration with Other Tools

CI/CD Pipelines

Use the JSON report to make pass/fail decisions or extract metrics in CI:


# GitHub Actions
- name: Run tests
  run: agenticassure run scenarios/ --adapter my_agent.MyAgent --output json
 
- name: Check results
  run: |
    PASS_RATE=$(jq '.pass_rate' results_*.json)
    echo "Pass rate: $PASS_RATE"
    if (( $(echo "$PASS_RATE < 0.8" | bc -l) )); then
      echo "Pass rate below threshold"
      exit 1
    fi

Trend Tracking

Store JSON reports over time to build a history of agent performance. Each report contains a run_id and timestamp, making it straightforward to track metrics like pass rate, average score, and duration across builds.


import json
import glob
 
reports = []
for path in sorted(glob.glob("results_*.json")):
    with open(path) as f:
        reports.append(json.load(f))
 
for r in reports:
    print(f"{r['timestamp']}: {r['pass_rate']:.0%} pass rate, {r['aggregate_score']:.2f} avg score")

Custom Dashboards

Ingest the JSON into tools like Grafana, Datadog, or a custom web dashboard. The structured format maps directly to time-series metrics:

pass_rate and aggregate_score as gauge metrics.
total_duration_ms for performance tracking.
Per-scenario duration_ms for identifying slow scenarios.
token_usage for cost monitoring.

Comparison Between Runs

Compare two JSON reports to detect regressions:


import json
 
with open("results_baseline.json") as f:
    baseline = json.load(f)
with open("results_current.json") as f:
    current = json.load(f)
 
baseline_scores = {sr["scenario"]["name"]: sr["passed"] for sr in baseline["scenario_results"]}
current_scores = {sr["scenario"]["name"]: sr["passed"] for sr in current["scenario_results"]}
 
for name, passed in current_scores.items():
    if name in baseline_scores and baseline_scores[name] and not passed:
        print(f"REGRESSION: {name} was passing, now failing")

Multi-Suite Runs

When running multiple suites, each suite generates its own JSON file:


JSON report written to results_a1b2c3d4.json
JSON report written to results_e5f67890.json

What’s Next

CLI Report — Terminal output for local development.
HTML Report — Shareable HTML reports.
agenticassure run — Full reference for the run command.
CI/CD Integration — Using reports in CI pipelines.