Skip to main content
A workflow is code, so it gets tests. nika test <file> runs the workflow under the mock provider — offline, deterministic, zero keys, zero tokens — and compares the typed outputs: against a golden file (<file>.golden.json) committed next to it. Same idea as snapshot testing: pin what the workflow produces, and any future edit that changes the contract fails loud in CI.
nika test triage.nika.yaml

Why the mock makes this possible

nika test swaps every infer: / agent: call to the mock provider, whatever model: the file declares — your ollama/llama3.2:3b (or mistral/…, or any cloud model) workflow tests offline, unchanged. The mock is not a stub that returns "ok". It is schema-conformant: when a task declares a schema:, the mock synthesizes an instance that validates against it — required fields present, types respected. Every schema workflow runs offline, end to end, through the same runtime, bindings, and typed-outputs validation as a live run.
triage.nika.yaml
nika: v1
workflow: ticket-triage
description: "Classify a support ticket into a typed verdict."

model: mock/echo                       # swap for ollama/llama3.2:3b or any provider

vars:
  ticket:
    type: string
    default: "Payment page returns a 502 after checkout."

tasks:
  - id: classify
    infer:
      prompt: "Classify this ticket: ${{ vars.ticket }}"
      max_tokens: 300
      schema:
        type: object
        required: [category, urgent]
        properties:
          category: { type: string }
          urgent: { type: boolean }

outputs:
  verdict: "${{ tasks.classify.output }}"
The golden pins the workflow’s outputs: block — a workflow without one pins {}. Declare typed outputs: and the golden guards the whole callable contract.

The golden lifecycle

First run — no golden exists yet, so nika test teaches instead of guessing (exit 3):
$ nika test triage.nika.yaml
nika test · no golden yet for triage.nika.yaml

  the golden pins this workflow's typed `outputs:` under the mock
  provider (offline · deterministic) — future runs must match it.

  create it:   nika test triage.nika.yaml --update
  then commit: triage.nika.yaml.golden.json
Create it with --update, review it once, commit it:
$ nika test triage.nika.yaml --update
✔ golden written · triage.nika.yaml.golden.json
  review it once, commit it — `nika test` now guards this workflow
The golden is small, readable JSON — the mock-synthesized, schema-conformant outputs:
triage.nika.yaml.golden.json
{
  "verdict": {
    "category": "mock",
    "urgent": false
  }
}
From now on nika test compares. A match is exit 0; drift renders a per-path diff and exits 1:
$ nika test triage.nika.yaml
✖ outputs drifted from the golden · triage.nika.yaml.golden.json
  + outputs.verdict.queue · not in the golden ("mock")

  intended? re-pin with: nika test triage.nika.yaml --update
If the drift was intended (you changed the schema, added an output), re-pin with --update and commit the new golden — the diff shows up in code review, exactly like a snapshot test.

Exit codes · the CI contract

ExitMeaning
0Outputs match the golden (or --update pinned one)
1The mock run failed, or outputs drifted from the golden
2The file has validation findings (nika check would fail — fix those first)
3No golden yet (create with --update), or environment error
nika test refuses to run a dirty file the same way nika run does: a workflow with check findings exits 2 before anything executes.

Wire it into CI

Offline and deterministic means no keys in CI, no flakes, no spend:
.github/workflows/nika-test.yml (fragment)
- name: Golden-test the workflows
  run: |
    for wf in workflows/*.nika.yaml; do
      nika test "$wf"
    done
Commit the *.golden.json files in the same directory as the workflows they guard. A PR that edits a workflow either keeps its golden green or shows the re-pinned golden in the diff — both are reviewable.