Skip to content

Ollama

Ollama runs open-source models locally. This is ideal for privacy-sensitive workflows or when you want to avoid API costs.

Prerequisites

  1. Install Ollama from ollama.com
  2. Start the Ollama service:
    Terminal window
    ollama serve

Setup

Terminal window
conductor provider add ollama

By default, Conductor connects to http://localhost:11434. To use a different address:

Terminal window
conductor provider add ollama --base-url http://your-server:11434

Choose models based on your available resources. Conductor uses three model tiers: fast, balanced, and strategic.

TierModel
fastqwen3:4b
balancedqwen3:8b
strategicqwen3:8b
Terminal window
ollama pull qwen3:4b qwen3:8b

At this level, balanced and strategic share a model. Good for simple tasks but limited complex reasoning.

Configure Model Tiers

After pulling models, assign them to tiers:

Terminal window
conductor model discover ollama

Follow the interactive prompts to map your models to fast, balanced, and strategic tiers.

Verify

Terminal window
conductor provider test ollama

Set as Default

To make Ollama your default provider:

Terminal window
conductor provider add ollama --default

Or set via environment variable:

Terminal window
export LLM_DEFAULT_PROVIDER=ollama

Performance Tips

  • Keep models loaded: Set OLLAMA_KEEP_ALIVE=3600 to keep models in memory for an hour
  • GPU layers: Ollama automatically optimizes GPU/CPU split based on available VRAM
  • Quantization: Models ending in -q4 use 4-bit quantization for lower memory usage with minimal quality loss

Next Steps

  • Learn about model tiers and when to use each
  • Continue to the tutorial to build your first workflow