Skip to Content
GuidesCI Integration

CI/CD Integration

Running AgenticAssure tests in your CI/CD pipeline ensures that changes to your agent, its prompts, its tools, or its underlying model do not introduce regressions. This guide covers how to set up AgenticAssure in continuous integration, with a focus on GitHub Actions.


Why Run Agent Tests in CI

AI agents are affected by changes that traditional test suites do not catch:

  • Prompt changes: A small edit to a system prompt can alter behavior across many scenarios.
  • Model upgrades: Switching from gpt-4 to gpt-4o (or any model version change) may shift outputs, tool-calling behavior, or response style.
  • Tool implementation changes: Modifying a tool’s interface or behavior affects how the agent integrates with it.
  • Dependency updates: Upgrading LangChain, OpenAI SDK, or other libraries can introduce subtle behavioral changes.

By running AgenticAssure in CI, you get an automated safety net that flags regressions before they reach production.


GitHub Actions Example

Below is a complete GitHub Actions workflow that runs AgenticAssure tests on every pull request and pushes to main.

# .github/workflows/agent-tests.yml name: Agent Tests on: push: branches: [main] pull_request: branches: [main] jobs: agent-tests: runs-on: ubuntu-latest timeout-minutes: 15 steps: - name: Checkout code uses: actions/checkout@v4 - name: Set up Python uses: actions/setup-python@v5 with: python-version: "3.11" - name: Install dependencies run: | pip install --upgrade pip pip install agenticassure pip install -r requirements.txt - name: Validate scenarios run: agenticassure validate scenarios/ - name: Run agent tests env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} run: | agenticassure run scenarios/ \ --adapter myproject.agent.MyAgent \ --output cli - name: Generate HTML report if: always() env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} run: | agenticassure run scenarios/ \ --adapter myproject.agent.MyAgent \ --output html \ || true - name: Upload test report if: always() uses: actions/upload-artifact@v4 with: name: agent-test-report path: report_*.html retention-days: 30

What This Workflow Does

  1. Validates scenarios: Runs agenticassure validate to catch YAML syntax errors before executing any tests. This step is fast and does not require an LLM API key.
  2. Runs agent tests: Executes all scenarios using the specified adapter. The process exits with code 1 if any scenario fails, which causes the GitHub Actions job to fail.
  3. Generates an HTML report: Produces a detailed report as a CI artifact for review, even if some tests failed (note the || true to prevent the step from failing and the if: always() condition).
  4. Uploads the report: Stores the HTML report as a downloadable artifact.

Setting Up the Adapter in CI

Your adapter class must be importable in the CI environment. There are several approaches:

Option 1: Adapter in Your Package

If your adapter is part of your application code:

myproject/ agent.py # contains MyAgent class ... scenarios/ core_tests.yaml requirements.txt

The CLI command references it by its dotted import path:

agenticassure run scenarios/ --adapter myproject.agent.MyAgent

Make sure your package is installed (or at minimum on PYTHONPATH):

- name: Install project run: pip install -e .

Option 2: Config File

Create an agenticassure.yaml in your repository root:

adapter: myproject.agent.MyAgent

Then you can omit the --adapter flag:

agenticassure run scenarios/

Option 3: Standalone Adapter File

For simpler setups, place a standalone adapter file in your repo:

# tests/adapter.py from agenticassure.results import AgentResult from myproject import create_agent class CITestAgent: def __init__(self): self.agent = create_agent() def run(self, input, context=None): response = self.agent.invoke(input) return AgentResult(output=response)
agenticassure run scenarios/ --adapter tests.adapter.CITestAgent

Managing Secrets

AI agent tests typically require API keys for the underlying LLM provider. Never commit API keys to your repository.

GitHub Actions Secrets

  1. Go to your repository on GitHub.
  2. Navigate to Settings > Secrets and variables > Actions.
  3. Click New repository secret.
  4. Add your key (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY).

Reference secrets in your workflow:

env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

Multiple Environments

If your agent uses different API keys for different environments, use GitHub Environments:

jobs: agent-tests: runs-on: ubuntu-latest environment: staging steps: - name: Run tests env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} run: agenticassure run scenarios/

Using Exit Codes for Pass/Fail Gating

AgenticAssure exits with code 1 if any scenario fails and code 0 if all scenarios pass. This integrates naturally with CI systems that treat non-zero exit codes as failures.

To use this as a required check on pull requests:

  1. Run your agent tests in a CI job (as shown above).
  2. In GitHub, go to Settings > Branches > Branch protection rules.
  3. Enable Require status checks to pass before merging.
  4. Select the agent test job as a required check.

Now pull requests cannot be merged if any agent test fails.


Generating Reports as CI Artifacts

HTML Reports

- name: Generate HTML report if: always() run: agenticassure run scenarios/ --adapter myproject.agent.MyAgent --output html || true - name: Upload HTML report if: always() uses: actions/upload-artifact@v4 with: name: agent-test-report path: report_*.html

JSON Reports

JSON reports are useful for downstream processing, dashboards, or trend analysis:

- name: Generate JSON report if: always() run: agenticassure run scenarios/ --adapter myproject.agent.MyAgent --output json || true - name: Upload JSON report if: always() uses: actions/upload-artifact@v4 with: name: agent-test-results path: results_*.json

Running Specific Tags in CI vs Locally

Use tags to run different subsets of tests in different contexts.

Fast CI Checks

Run only critical, fast scenarios on every PR:

- name: Run smoke tests run: | agenticassure run scenarios/ \ --adapter myproject.agent.MyAgent \ --tag critical \ --tag fast

Full Nightly Suite

Run the complete test suite on a schedule:

# .github/workflows/nightly-agent-tests.yml name: Nightly Agent Tests on: schedule: - cron: "0 6 * * *" # 6 AM UTC daily jobs: full-suite: runs-on: ubuntu-latest timeout-minutes: 60 steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: python-version: "3.11" - run: pip install agenticassure -r requirements.txt - name: Run all tests env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} run: | agenticassure run scenarios/ \ --adapter myproject.agent.MyAgent \ --output html - uses: actions/upload-artifact@v4 if: always() with: name: nightly-report path: report_*.html

Local Development

Run only scenarios relevant to what you are working on:

# Run only scenarios tagged "orders" agenticassure run scenarios/ --adapter myproject.agent.MyAgent -t orders # Run safety tests before committing agenticassure run scenarios/ --adapter myproject.agent.MyAgent -t safety

Performance Considerations

Timeouts

LLM API calls can be slow. Set appropriate timeouts at the suite level and per-scenario:

suite: name: ci-tests config: default_timeout: 60 scenarios: - name: quick_lookup input: "What is my balance?" timeout_seconds: 15 tags: [fast] - name: complex_research input: "Analyze this data and provide recommendations" timeout_seconds: 120 tags: [slow]

Set a job-level timeout in your CI workflow to prevent runaway jobs:

jobs: agent-tests: timeout-minutes: 15

Cost Management

Each scenario run makes at least one LLM API call. For a suite of 50 scenarios using GPT-4, a single CI run can cost several dollars. Strategies to manage costs:

  • Tag and filter: Run only critical scenarios on every PR. Run the full suite nightly or weekly.
  • Use cheaper models in CI: If feasible, test with a less expensive model for basic checks and reserve the production model for nightly runs.
  • Limit retries: Set retries: 0 in CI to avoid multiplying API calls on flaky tests.
  • Cache or mock when possible: For pure integration testing, consider a mock adapter that replays recorded responses.

Retries

Use retries cautiously in CI. LLM outputs are non-deterministic, so a scenario that fails once might pass on retry. However, retries also increase cost and run time:

suite: config: retries: 1 # One retry on failure

Example: PR Check Workflow

Here is a minimal but complete workflow suitable for gating pull requests:

# .github/workflows/pr-agent-check.yml name: PR Agent Check on: pull_request: branches: [main] jobs: check: runs-on: ubuntu-latest timeout-minutes: 10 steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: python-version: "3.11" - name: Install run: | pip install agenticassure pip install -e . - name: Validate scenario files run: agenticassure validate scenarios/ - name: Run critical agent tests env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} run: | agenticassure run scenarios/ \ --adapter myproject.agent.MyAgent \ --tag critical \ --timeout 30 - name: Upload report if: always() env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} run: | agenticassure run scenarios/ \ --adapter myproject.agent.MyAgent \ --tag critical \ --output html \ || true - uses: actions/upload-artifact@v4 if: always() with: name: pr-agent-report path: report_*.html

This workflow validates YAML files first (cheap, no API calls), runs only critical-tagged scenarios to keep costs and time low, and uploads an HTML report for review regardless of pass/fail status.


Other CI Systems

While the examples above use GitHub Actions, AgenticAssure works with any CI system that can run shell commands. The core pattern is the same:

  1. Install Python and dependencies.
  2. Set LLM API keys as environment variables.
  3. Run agenticassure validate for fast validation.
  4. Run agenticassure run with your adapter.
  5. Check the exit code (0 = pass, 1 = fail).
  6. Collect report files as artifacts.

GitLab CI Example

agent-tests: image: python:3.11 variables: OPENAI_API_KEY: $OPENAI_API_KEY script: - pip install agenticassure -r requirements.txt - agenticassure validate scenarios/ - agenticassure run scenarios/ --adapter myproject.agent.MyAgent --output html artifacts: paths: - report_*.html when: always expire_in: 30 days
Last updated on