Skip to content

Providers

koto is provider-agnostic. You bring your own LLM and API key. Switch providers by changing one line in your config — translations, contexts, and workflows stay the same.

Choosing a provider

ProviderQualitySpeedCostLocalBest for
OpenAI (gpt-4o-mini)ExcellentFast~$0.15/1M tokensNoBest balance of quality and cost
OpenAI (gpt-4o)OutstandingModerate~$2.50/1M tokensNoMaximum quality for critical content
Anthropic (Claude Sonnet)OutstandingModerate~$3.00/1M tokensNoNuanced, context-sensitive translations
Google (Gemini 2.0 Flash)Very goodVery fast~$0.10/1M tokensNoLarge batches where speed matters
Ollama (llama3.2)GoodVariesFreeYesPrivacy-sensitive projects, offline use

OpenAI

Setup

  1. Get an API key from platform.openai.com
  2. Set the environment variable:
Terminal window
export OPENAI_API_KEY=sk-...

Configuration

provider: {
name: 'openai',
model: 'gpt-4o-mini', // or 'gpt-4o' for higher quality
}

Available models

ModelQualitySpeedCost
gpt-4o-miniExcellentFastLow
gpt-4oOutstandingModerateHigher
gpt-4-turboOutstandingModerateHigher

Anthropic (Claude)

Setup

  1. Get an API key from console.anthropic.com
  2. Set the environment variable:
Terminal window
export ANTHROPIC_API_KEY=sk-ant-...

Configuration

provider: {
name: 'anthropic',
model: 'claude-sonnet-4-20250514',
}

Available models

ModelQualitySpeedCost
claude-sonnet-4-20250514OutstandingModerateModerate
claude-haiku-35-20241022Very goodFastLow

Claude models are particularly strong at understanding nuance and context, making them a good fit for context-profile-heavy configurations.


Google Gemini

Setup

  1. Get an API key from aistudio.google.com
  2. Set the environment variable:
Terminal window
export GOOGLE_API_KEY=...

Configuration

provider: {
name: 'google',
model: 'gemini-2.0-flash',
}

Available models

ModelQualitySpeedCost
gemini-2.0-flashVery goodVery fastVery low
gemini-2.0-proExcellentModerateModerate

Gemini Flash is the fastest option, ideal for large batches where speed matters more than peak quality.


Ollama (local)

Run LLMs locally. No API key needed, no data leaves your network.

Setup

  1. Install Ollama from ollama.com
  2. Pull a model:
Terminal window
ollama pull llama3.2
  1. Start the server:
Terminal window
ollama serve

Configuration

provider: {
name: 'ollama',
model: 'llama3.2',
baseUrl: 'http://localhost:11434', // default
}
ModelParametersQualityRAM needed
llama3.23BGood4 GB
llama3.18BVery good8 GB
mistral7BGood8 GB
mixtral47B (MoE)Very good32 GB

Custom provider (OpenAI-compatible)

Any service that implements the OpenAI chat completions API works out of the box:

provider: {
name: 'openai',
model: 'your-model-name',
baseUrl: 'https://your-provider.com/v1',
}

This works with Azure OpenAI, Together AI, Anyscale, self-hosted vLLM, and more. Set your API key via OPENAI_API_KEY.


Switching providers

Changing providers is a one-line config change. Everything else stays the same:

provider: {
// Before
// name: 'openai',
// model: 'gpt-4o-mini',
// After
name: 'anthropic',
model: 'claude-sonnet-4-20250514',
}