Skip to main content
Nika is local-first: point model: at a local server and your data never leaves the machine. This page is the ladder, from the one-command path to production servers.

1. The one-command path: Ollama

Install Ollama, then:
ollama pull llama3.2:3b
That’s it. In any workflow:
model: ollama/llama3.2:3b
nika doctor confirms the wiring (it detects the running Ollama server and prints the exact fix if something is off).
llama3.2:3b is the showcase default: small enough for most laptops, good enough for structured extraction. Swap the tag for anything in the Ollama library: qwen2.5, mistral, phi4, larger llama variants.

2. Any Hugging Face GGUF, still one command

Ollama pulls directly from the Hugging Face Hub. Any public GGUF repo works, no account needed:
ollama run hf.co/bartowski/Llama-3.2-3B-Instruct-GGUF
Pick a quantization explicitly with a tag:
ollama run hf.co/bartowski/Llama-3.2-3B-Instruct-GGUF:Q4_K_M
Then reference it in the workflow exactly as pulled:
model: ollama/hf.co/bartowski/Llama-3.2-3B-Instruct-GGUF:Q4_K_M
Rule of thumb for quantizations: Q4_K_M is the sane default (quality per GB), Q8_0 when you have RAM to spare, Q2/Q3 only when memory is tight. The Hub’s GGUF filter lists every compatible repo.

3. LM Studio: the visual browser

Prefer a GUI? LM Studio browses Hugging Face, downloads models with a click, and serves an OpenAI-compatible endpoint. Start its server, then:
model: lmstudio/<the-model-id-shown-in-lm-studio>

4. Servers: llama.cpp and vLLM

For shared machines and production:
  • llama.cpp llama-server serves any GGUF: model: llamacpp/<model>
  • vLLM serves full-precision Hub models at datacenter throughput: model: vllm/<hub-repo-id>
Both speak the same OpenAI-compatible dialect; the prefix picks the provider, nothing else in the file changes.

Checking what’s wired

nika doctor
Doctor reports every provider (local servers detected, cloud keys present) with a fix-form per missing piece. The audit before a run prices local models at $0.000:
nika check my-workflow.nika.yaml

Swapping between local and cloud

The file does not change shape. One line moves the workflow between a laptop and an API:
model: ollama/llama3.2:3b        # local · free · private
# model: mistral/mistral-small   # EU cloud
# model: anthropic/claude-haiku-4-5
The permits boundary applies either way: a permits: block with no net.http entry means the workflow cannot reach the network even if the model could. Local model + closed permits = fully air-gapped AI work.

Providers

The full catalog and how one InferRequest speaks every dialect.

First workflow

Five minutes from install to a checked, runnable file.