API Documentation

Omnisphere is an OpenAI-compatible multi-LLM router. Use it as a drop-in replacement for the OpenAI API.

Base URL: http://YOUR_SERVER:3005
Auth: Authorization: Bearer omni_your_key

Authentication

All endpoints except /api/status, /v1/models, and /api/pricing require an API key.

Authorization: Bearer omni_your_api_key_here

API Key Tiers

TierRate LimitMonthly BudgetBest For
Free60 req/min$1.00Testing, hobby projects
Pro120 req/min$25.00Production apps
Enterprise300 req/min$50.00+High-volume, custom

Endpoints

POST /v1/chat/completions

OpenAI-compatible chat endpoint. Drop-in replacement.

{
  "model": "claude-sonnet-4",     // or any model from /v1/models
  "messages": [
    {"role": "user", "content": "Hello, world!"}
  ],
  "max_tokens": 1024              // optional, default 1024
}

Response: Standard OpenAI format + _meta with cost/latency info.

{
  "id": "chatcmpl-1234",
  "model": "claude-sonnet-4",
  "choices": [{"message": {"role": "assistant", "content": "..."}}],
  "usage": {"prompt_tokens": 10, "completion_tokens": 50},
  "_meta": {"source": "live", "provider": "anthropic", "cost_usd": 0.001, "tier": "balanced"}
}

POST /api/query

Auto-routed single query. Omnisphere picks the optimal model based on complexity.

{
  "prompt": "What is the meaning of life?",
  "tier": "balanced",           // ultraCheap | cheap | balanced | premium
  "max_tokens": 512             // optional
}

POST /api/consensus

Query multiple models simultaneously and get a weighted synthesis.

{
  "prompt": "Compare React vs Vue for enterprise apps",
  "tier": "premium",            // auto-selects 3 models from this tier
  "models": ["claude-sonnet-4", "gpt-4o-mini", "gemini-2.0-flash"]  // or specify
}

GET /v1/models

List all available models with pricing. No auth required.

GET /api/status

Server health check. No auth required.

GET /api/pricing

Detailed pricing for all models and tiers. No auth required.

GET /api/health

Provider health status from SmartRouter. Shows which models are healthy/failing.

GET /api/metrics

Aggregate metrics per model: calls, latency, cost, success rate.

GET /api/test-all

Tests every model and reports live/demo status. Use to verify which providers are working.

Available Models

Model IDProviderInput $/MOutput $/MQualitySpeed
claude-sonnet-4Anthropic$3.00$15.00⭐9Fast
claude-haiku-3.5Anthropic$0.80$4.00⭐7Fast
gpt-4o-miniOpenAI$0.15$0.60⭐7Fast
gpt-4oOpenAI$2.50$10.00⭐9Medium
gemini-2.0-flashGoogle$0.075$0.30⭐7Fast
gemini-2.5-proGoogle$1.25$5.00⭐9Slow
minimax-m2.5MiniMax$0.11$0.11⭐8Fast
deepseek-v3.2DeepSeek$0.14$0.28⭐8Fast

Cost Tiers

TierAvg CostModels UsedBest For
ultraCheap$0.0001MiniMax, DeepSeek, Gemini FlashBulk, simple tasks
cheap$0.0005Gemini Flash, GPT-4o-mini, MiniMaxStandard Q&A
balanced$0.003Claude Sonnet, GPT-4o-mini, Gemini FlashCoding, analysis
premium$0.01Claude Sonnet, GPT-4o, Gemini ProResearch, strategy

Smart Routing

If you don't specify a model, Omnisphere analyzes your prompt complexity and auto-routes:

The SmartRouter tracks provider health and automatically routes around failures. If Claude is down, queries go to GPT. If both are down, Gemini. If all direct APIs fail, OpenRouter is used as universal fallback.

Error Handling

// 401 — Bad or missing API key
{"error": {"message": "API key required", "type": "auth_error"}}

// 429 — Rate limit exceeded
{"error": {"message": "Rate limit exceeded (60/min)", "type": "rate_limit"}}

// 402 — Monthly budget exhausted
{"error": {"message": "Monthly budget exhausted ($1.00/$1.00)", "type": "budget_exceeded"}}