Providers
KosmoKrator supports 20+ LLM providers out of the box, from major cloud APIs to local models and Chinese-market providers. You can also add custom OpenAI-compatible endpoints.
Built-in Providers
Every built-in provider is ready to use after entering credentials. The table below lists all providers shipped with KosmoKrator, their authentication mode, and key notes.
| Provider ID | Label | Auth Mode | Notes |
|---|---|---|---|
anthropic | Anthropic | API Key | Claude family — Opus 4.5, Sonnet 4.5, Haiku 4.5 |
openai | OpenAI | API Key | GPT-4o, GPT-4.1 family, o-series reasoning models |
codex | Codex (ChatGPT) | OAuth | Browser/device login flow, uses your ChatGPT subscription |
gemini | Google Gemini | API Key | Gemini 2.5 Pro and Flash |
deepseek | DeepSeek | API Key | DeepSeek V3 (chat), R1 (reasoning) |
groq | Groq | API Key | Ultra-fast inference on dedicated hardware |
mistral | Mistral | API Key | Mistral Large, Codestral |
xai | xAI | API Key | Grok 3, with reasoning support |
openrouter | OpenRouter | API Key | Meta-router for 100+ models from multiple providers |
perplexity | Perplexity | API Key | Online search-augmented models |
ollama | Ollama | None | Local models, no remote credentials required |
kimi | Kimi (Moonshot) | API Key | Long-context Chinese/English models |
kimi-coding | Kimi Coding | API Key | Code-optimized Moonshot endpoint |
mimo | Xiaomi MiMo Token Plan | API Key | MiMo models via token-plan key (free tier available) |
mimo-api | Xiaomi MiMo API | API Key | MiMo pay-as-you-go API |
minimax | MiniMax | API Key | MiniMax models |
minimax-cn | MiniMax CN | API Key | MiniMax China-region endpoint |
z | Z.AI | API Key | Z.AI coding endpoint |
z-api | Z.AI API | API Key | Z.AI standard API endpoint |
stepfun | StepFun | API Key | Step models |
stepfun-plan | StepFun Plan | API Key | Step Plan subscription endpoint with reasoning support |
Authentication Setup
First-run wizard
The easiest way to configure credentials is the interactive setup command, which walks you through provider selection and API key entry:
kosmokrator setup API key storage
API keys entered through the setup wizard or the /settings command are encrypted and stored in the local SQLite database at ~/.kosmokrator/data/kosmokrator.db. Keys are never written to plain-text config files.
Environment variables
Alternatively, you can set provider API keys via environment variables. These are read from your Prism PHP configuration and take effect if no key is stored in the database. Common variables:
ANTHROPIC_API_KEY— AnthropicOPENAI_API_KEY— OpenAIDEEPSEEK_API_KEY— DeepSeekGROQ_API_KEY— GroqMISTRAL_API_KEY— MistralXAI_API_KEY— xAIOPENROUTER_API_KEY— OpenRouterPERPLEXITY_API_KEY— PerplexityGEMINI_API_KEY— Google GeminiKIMI_API_KEY— Kimi / Kimi CodingMIMO_API_KEY— MiMo (token plan)MIMO_PAYG_API_KEY— MiMo (pay-as-you-go API)MINIMAX_API_KEY— MiniMaxMINIMAX_CN_API_KEY— MiniMax CN (China region)STEPFUN_API_KEY— StepFun / StepFun PlanZAI_API_KEY— Z.AI / Z.AI API
/settings and also have an environment variable, the stored key is used.
OAuth flow (Codex / ChatGPT)
The codex provider uses a browser-based OAuth device login flow tied to your ChatGPT subscription. When you select Codex as your provider:
- KosmoKrator starts a local callback server on port
9876(configurable inconfig/kosmokrator.yaml). - Your browser opens to a ChatGPT authorization page.
- After granting access, the OAuth tokens are stored and refreshed automatically.
Token status is shown in the settings UI — including the associated email, expiration state, and whether a refresh is due.
Switching Providers
You can change the active provider and model at any time during a session:
- Open the settings panel with the
/settingscommand. - Navigate to the Agent category.
- Change
default_providerto the desired provider ID. - Change
default_modelto a model supported by that provider.
Both settings have applies_now effect — the change takes effect on the very next LLM call without restarting the session.
Per-Depth Model Overrides
KosmoKrator supports running different models at different agent depths. This lets you use a powerful (and more expensive) model for the main agent while routing subagents to faster or cheaper models.
| Depth | Role | Settings | Fallback |
|---|---|---|---|
| 0 | Main agent | default_provider / default_model | — |
| 1 | Subagents | subagent_provider / subagent_model | Inherits from depth 0 |
| 2+ | Sub-subagents | subagent_depth2_provider / subagent_depth2_model | Inherits from depth 1, then depth 0 |
The resolution cascade works as follows: depth-2+ overrides fall back to depth-1 overrides, which fall back to the main agent defaults. Leave a setting empty to inherit from the parent depth.
Example: cost-optimized hierarchy
# Main agent — most capable model
default_provider: anthropic
default_model: claude-opus-4-5-20250929
# Subagents — fast and affordable
subagent_provider: anthropic
subagent_model: claude-haiku-4-5-20251001
# Sub-subagents — inherit from subagent settings
# (leave subagent_depth2_provider and subagent_depth2_model empty) /settings. Each setting applies immediately when changed.
Custom Providers
Any OpenAI-compatible API endpoint can be added as a custom provider. This is useful for self-hosted models, corporate proxies, or providers not yet included in the built-in catalog.
Adding a custom provider
- Open
/settingsand navigate to Provider Setup. - Add a new provider with a unique ID.
- Configure the required fields:
| Field | Description | Example |
|---|---|---|
label | Human-readable name shown in the UI | My Corporate LLM |
base_url | Full URL to the chat completions endpoint | https://llm.corp.example/v1 |
api_key | API key for authentication | sk-corp-... |
default_model | Model identifier to use by default | llama-3.1-70b |
Custom providers use the relay system for request/response normalization, so they work with tool calling, streaming, and all other agent features as long as the endpoint implements the OpenAI chat completions format.
Reasoning Support
Some providers support extended thinking / reasoning modes, where the model performs chain-of-thought reasoning before producing its final answer. KosmoKrator controls this via the reasoning_effort setting (under the Agent category in /settings).
| Provider | Reasoning Behavior | Effort Levels |
|---|---|---|
openai | Controllable via reasoning_effort for o-series models (o1, o3, o4-mini) | low / medium / high |
xai | Controllable via reasoning_effort for Grok 3 Think models | low / medium / high |
deepseek | Always-on reasoning for R1 models | Not configurable |
stepfun, stepfun-plan | Always-on reasoning | Not configurable |
kimi, kimi-coding | Always-on reasoning | Not configurable |
groq | Always-on reasoning | Not configurable |
mistral | Always-on reasoning | Not configurable |
perplexity | Always-on reasoning | Not configurable |
openrouter | Always-on reasoning | Not configurable |
z, z-api | Always-on reasoning | Not configurable |
minimax, minimax-cn | Always-on reasoning | Not configurable |
mimo, mimo-api | Always-on reasoning | Not configurable |
| All others | No reasoning support | Setting is safely ignored |
reasoning_effort parameter. It is handled internally by the driver when supported models are used.
The available effort levels are off, low, medium, and high. Setting the value to off disables reasoning parameters entirely, even for providers that support it.
low or medium for routine tasks and reserve high for complex multi-step problems.
LLM Clients
Under the hood, KosmoKrator uses two client implementations to communicate with LLM providers. The correct client is selected automatically based on the provider.
AsyncLlmClient
The primary client for most providers. Built on Amp HTTP, it sends raw HTTP requests to OpenAI-compatible chat completions endpoints with full async streaming support. Used for:
- OpenAI, DeepSeek, Groq, Mistral, xAI, OpenRouter, Perplexity
- Ollama, Kimi, Kimi Coding, MiMo, MiMo API, Z.AI, Z.AI API, StepFun, StepFun Plan
- All custom providers (OpenAI-compatible endpoints)
PrismService
A synchronous client backed by the Prism PHP SDK. Used for providers that have native Prism drivers with specialized request/response handling:
- Anthropic (Claude) — uses Prism's native Anthropic driver with prompt caching
- Google Gemini — uses Prism's native Gemini driver
- MiniMax, MiniMax CN — uses Prism's Anthropic-compatible driver (Anthropic-format endpoints)
RetryableLlmClient
A decorator that wraps either client, adding automatic retry logic with exponential backoff and jitter. Retries are triggered on:
- Rate limits (HTTP 429) — honors
Retry-Afterheaders from the provider - Server errors (HTTP 5xx) — transient provider outages
- Network failures — connection timeouts, DNS resolution errors
The maximum number of retry attempts is configurable via the max_retries setting. A value of 0 means unlimited retries (the agent keeps trying until the provider responds successfully).