Providers

koto is provider-agnostic. You bring your own LLM and API key. Switch providers by changing one line in your config — translations, contexts, and workflows stay the same.

Choosing a provider

Provider	Quality	Speed	Cost	Local	Best for
OpenAI (gpt-4o-mini)	Excellent	Fast	~$0.15/1M tokens	No	Best balance of quality and cost
OpenAI (gpt-4o)	Outstanding	Moderate	~$2.50/1M tokens	No	Maximum quality for critical content
Anthropic (Claude Sonnet)	Outstanding	Moderate	~$3.00/1M tokens	No	Nuanced, context-sensitive translations
Google (Gemini 2.0 Flash)	Very good	Very fast	~$0.10/1M tokens	No	Large batches where speed matters
Ollama (llama3.2)	Good	Varies	Free	Yes	Privacy-sensitive projects, offline use

OpenAI

Setup

Get an API key from platform.openai.com
Set the environment variable:

export OPENAI_API_KEY=sk-...

Configuration

provider: {
  name: 'openai',
  model: 'gpt-4o-mini', // or 'gpt-4o' for higher quality
}

Available models

Model	Quality	Speed	Cost
`gpt-4o-mini`	Excellent	Fast	Low
`gpt-4o`	Outstanding	Moderate	Higher
`gpt-4-turbo`	Outstanding	Moderate	Higher

Anthropic (Claude)

Setup

Get an API key from console.anthropic.com
Set the environment variable:

export ANTHROPIC_API_KEY=sk-ant-...

Configuration

provider: {
  name: 'anthropic',
  model: 'claude-sonnet-4-20250514',
}

Available models

Model	Quality	Speed	Cost
`claude-sonnet-4-20250514`	Outstanding	Moderate	Moderate
`claude-haiku-35-20241022`	Very good	Fast	Low

Claude models are particularly strong at understanding nuance and context, making them a good fit for context-profile-heavy configurations.

Google Gemini

Setup

Get an API key from aistudio.google.com
Set the environment variable:

export GOOGLE_API_KEY=...

Configuration

provider: {
  name: 'google',
  model: 'gemini-2.0-flash',
}

Available models

Model	Quality	Speed	Cost
`gemini-2.0-flash`	Very good	Very fast	Very low
`gemini-2.0-pro`	Excellent	Moderate	Moderate

Gemini Flash is the fastest option, ideal for large batches where speed matters more than peak quality.

Ollama (local)

Run LLMs locally. No API key needed, no data leaves your network.

Setup

Install Ollama from ollama.com
Pull a model:

ollama pull llama3.2

Start the server:

ollama serve

Configuration

provider: {
  name: 'ollama',
  model: 'llama3.2',
  baseUrl: 'http://localhost:11434', // default
}

Recommended models

Model	Parameters	Quality	RAM needed
`llama3.2`	3B	Good	4 GB
`llama3.1`	8B	Very good	8 GB
`mistral`	7B	Good	8 GB
`mixtral`	47B (MoE)	Very good	32 GB

Custom provider (OpenAI-compatible)

Any service that implements the OpenAI chat completions API works out of the box:

provider: {
  name: 'openai',
  model: 'your-model-name',
  baseUrl: 'https://your-provider.com/v1',
}

This works with Azure OpenAI, Together AI, Anyscale, self-hosted vLLM, and more. Set your API key via OPENAI_API_KEY.

Switching providers

Changing providers is a one-line config change. Everything else stays the same:

provider: {
  // Before
  // name: 'openai',
  // model: 'gpt-4o-mini',

  // After
  name: 'anthropic',
  model: 'claude-sonnet-4-20250514',
}