Ollama

Ollama runs open-source models locally. This is ideal for privacy-sensitive workflows or when you want to avoid API costs.

Prerequisites

conductor provider add ollama

By default, Conductor connects to http://localhost:11434. To use a different address:

conductor provider add ollama --base-url http://your-server:11434

Choose models based on your available resources. Conductor uses three model tiers: fast, balanced, and strategic.

ollama pull qwen3:4b qwen3:8b

At this level, balanced and strategic share a model. Good for simple tasks but limited complex reasoning.

ollama pull qwen3:8b qwen3:32b deepseek-r1:32b

The 32B models provide strong reasoning. May run slower but significantly more capable.

ollama pull qwen3:8b qwen3:32b deepseek-r1:70b

The 70B model delivers reasoning comparable to top commercial models.

ollama pull qwen3:8b qwen3:32b deepseek-r1:70b

With sufficient VRAM, the 70B model runs at interactive speeds. Consider qwen3:235b if you have multiple GPUs.

After pulling models, assign them to tiers:

conductor model discover ollama

Follow the interactive prompts to map your models to fast, balanced, and strategic tiers.

conductor provider test ollama

To make Ollama your default provider:

conductor provider add ollama --default

Or set via environment variable:

export LLM_DEFAULT_PROVIDER=ollama

Keep models loaded: Set OLLAMA_KEEP_ALIVE=3600 to keep models in memory for an hour
GPU layers: Ollama automatically optimizes GPU/CPU split based on available VRAM
Quantization: Models ending in -q4 use 4-bit quantization for lower memory usage with minimal quality loss