Providers
koto is provider-agnostic. You bring your own LLM and API key. Switch providers by changing one line in your config — translations, contexts, and workflows stay the same.
Choosing a provider
| Provider | Quality | Speed | Cost | Local | Best for |
|---|---|---|---|---|---|
| OpenAI (gpt-4o-mini) | Excellent | Fast | ~$0.15/1M tokens | No | Best balance of quality and cost |
| OpenAI (gpt-4o) | Outstanding | Moderate | ~$2.50/1M tokens | No | Maximum quality for critical content |
| Anthropic (Claude Sonnet) | Outstanding | Moderate | ~$3.00/1M tokens | No | Nuanced, context-sensitive translations |
| Google (Gemini 2.0 Flash) | Very good | Very fast | ~$0.10/1M tokens | No | Large batches where speed matters |
| Ollama (llama3.2) | Good | Varies | Free | Yes | Privacy-sensitive projects, offline use |
OpenAI
Setup
- Get an API key from platform.openai.com
- Set the environment variable:
export OPENAI_API_KEY=sk-...Configuration
provider: { name: 'openai', model: 'gpt-4o-mini', // or 'gpt-4o' for higher quality}Available models
| Model | Quality | Speed | Cost |
|---|---|---|---|
gpt-4o-mini | Excellent | Fast | Low |
gpt-4o | Outstanding | Moderate | Higher |
gpt-4-turbo | Outstanding | Moderate | Higher |
Anthropic (Claude)
Setup
- Get an API key from console.anthropic.com
- Set the environment variable:
export ANTHROPIC_API_KEY=sk-ant-...Configuration
provider: { name: 'anthropic', model: 'claude-sonnet-4-20250514',}Available models
| Model | Quality | Speed | Cost |
|---|---|---|---|
claude-sonnet-4-20250514 | Outstanding | Moderate | Moderate |
claude-haiku-35-20241022 | Very good | Fast | Low |
Claude models are particularly strong at understanding nuance and context, making them a good fit for context-profile-heavy configurations.
Google Gemini
Setup
- Get an API key from aistudio.google.com
- Set the environment variable:
export GOOGLE_API_KEY=...Configuration
provider: { name: 'google', model: 'gemini-2.0-flash',}Available models
| Model | Quality | Speed | Cost |
|---|---|---|---|
gemini-2.0-flash | Very good | Very fast | Very low |
gemini-2.0-pro | Excellent | Moderate | Moderate |
Gemini Flash is the fastest option, ideal for large batches where speed matters more than peak quality.
Ollama (local)
Run LLMs locally. No API key needed, no data leaves your network.
Setup
- Install Ollama from ollama.com
- Pull a model:
ollama pull llama3.2- Start the server:
ollama serveConfiguration
provider: { name: 'ollama', model: 'llama3.2', baseUrl: 'http://localhost:11434', // default}Recommended models
| Model | Parameters | Quality | RAM needed |
|---|---|---|---|
llama3.2 | 3B | Good | 4 GB |
llama3.1 | 8B | Very good | 8 GB |
mistral | 7B | Good | 8 GB |
mixtral | 47B (MoE) | Very good | 32 GB |
Custom provider (OpenAI-compatible)
Any service that implements the OpenAI chat completions API works out of the box:
provider: { name: 'openai', model: 'your-model-name', baseUrl: 'https://your-provider.com/v1',}This works with Azure OpenAI, Together AI, Anyscale, self-hosted vLLM, and more. Set your API key via OPENAI_API_KEY.
Switching providers
Changing providers is a one-line config change. Everything else stays the same:
provider: { // Before // name: 'openai', // model: 'gpt-4o-mini',
// After name: 'anthropic', model: 'claude-sonnet-4-20250514',}