Ollama
Ollama runs open-source models locally. This is ideal for privacy-sensitive workflows or when you want to avoid API costs.
Prerequisites
- Install Ollama from ollama.com
- Start the Ollama service:
Terminal window ollama serve
Setup
conductor provider add ollamaBy default, Conductor connects to http://localhost:11434. To use a different address:
conductor provider add ollama --base-url http://your-server:11434Recommended Models
Choose models based on your available resources. Conductor uses three model tiers: fast, balanced, and strategic.
| Tier | Model |
|---|---|
| fast | qwen3:4b |
| balanced | qwen3:8b |
| strategic | qwen3:8b |
ollama pull qwen3:4b qwen3:8bAt this level, balanced and strategic share a model. Good for simple tasks but limited complex reasoning.
| Tier | Model |
|---|---|
| fast | qwen3:8b |
| balanced | qwen3:32b |
| strategic | deepseek-r1:32b |
ollama pull qwen3:8b qwen3:32b deepseek-r1:32bThe 32B models provide strong reasoning. May run slower but significantly more capable.
| Tier | Model |
|---|---|
| fast | qwen3:8b |
| balanced | qwen3:32b |
| strategic | deepseek-r1:70b |
ollama pull qwen3:8b qwen3:32b deepseek-r1:70bThe 70B model delivers reasoning comparable to top commercial models.
| Tier | Model |
|---|---|
| fast | qwen3:8b |
| balanced | qwen3:32b |
| strategic | deepseek-r1:70b |
ollama pull qwen3:8b qwen3:32b deepseek-r1:70bWith sufficient VRAM, the 70B model runs at interactive speeds. Consider qwen3:235b if you have multiple GPUs.
Configure Model Tiers
After pulling models, assign them to tiers:
conductor model discover ollamaFollow the interactive prompts to map your models to fast, balanced, and strategic tiers.
Verify
conductor provider test ollamaSet as Default
To make Ollama your default provider:
conductor provider add ollama --defaultOr set via environment variable:
export LLM_DEFAULT_PROVIDER=ollamaPerformance Tips
- Keep models loaded: Set
OLLAMA_KEEP_ALIVE=3600to keep models in memory for an hour - GPU layers: Ollama automatically optimizes GPU/CPU split based on available VRAM
- Quantization: Models ending in
-q4use 4-bit quantization for lower memory usage with minimal quality loss
Next Steps
- Learn about model tiers and when to use each
- Continue to the tutorial to build your first workflow