API Documentation

Omnisphere is an OpenAI-compatible multi-LLM router. Use it as a drop-in replacement for the OpenAI API.

Base URL: http://YOUR_SERVER:3005
Auth: Authorization: Bearer omni_your_key

Authentication

All endpoints except /api/status, /v1/models, and /api/pricing require an API key.

Authorization: Bearer omni_your_api_key_here

API Key Tiers

Tier	Rate Limit	Monthly Budget	Best For
Free	60 req/min	$1.00	Testing, hobby projects
Pro	120 req/min	$25.00	Production apps
Enterprise	300 req/min	$50.00+	High-volume, custom

Endpoints

POST /v1/chat/completions

OpenAI-compatible chat endpoint. Drop-in replacement.

{
  "model": "claude-sonnet-4",     // or any model from /v1/models
  "messages": [
    {"role": "user", "content": "Hello, world!"}
  ],
  "max_tokens": 1024              // optional, default 1024
}

Response: Standard OpenAI format + _meta with cost/latency info.

{
  "id": "chatcmpl-1234",
  "model": "claude-sonnet-4",
  "choices": [{"message": {"role": "assistant", "content": "..."}}],
  "usage": {"prompt_tokens": 10, "completion_tokens": 50},
  "_meta": {"source": "live", "provider": "anthropic", "cost_usd": 0.001, "tier": "balanced"}
}

POST /api/query

Auto-routed single query. Omnisphere picks the optimal model based on complexity.

{
  "prompt": "What is the meaning of life?",
  "tier": "balanced",           // ultraCheap | cheap | balanced | premium
  "max_tokens": 512             // optional
}

POST /api/consensus

Query multiple models simultaneously and get a weighted synthesis.

{
  "prompt": "Compare React vs Vue for enterprise apps",
  "tier": "premium",            // auto-selects 3 models from this tier
  "models": ["claude-sonnet-4", "gpt-4o-mini", "gemini-2.0-flash"]  // or specify
}

GET /v1/models

List all available models with pricing. No auth required.

GET /api/status

Server health check. No auth required.

GET /api/pricing

Detailed pricing for all models and tiers. No auth required.

GET /api/health

Provider health status from SmartRouter. Shows which models are healthy/failing.

GET /api/metrics

Aggregate metrics per model: calls, latency, cost, success rate.

GET /api/test-all

Tests every model and reports live/demo status. Use to verify which providers are working.

Available Models

Model ID	Provider	Input $/M	Output $/M	Quality	Speed
`claude-sonnet-4`	Anthropic	$3.00	$15.00	⭐9	Fast
`claude-haiku-3.5`	Anthropic	$0.80	$4.00	⭐7	Fast
`gpt-4o-mini`	OpenAI	$0.15	$0.60	⭐7	Fast
`gpt-4o`	OpenAI	$2.50	$10.00	⭐9	Medium
`gemini-2.0-flash`	Google	$0.075	$0.30	⭐7	Fast
`gemini-2.5-pro`	Google	$1.25	$5.00	⭐9	Slow
`minimax-m2.5`	MiniMax	$0.11	$0.11	⭐8	Fast
`deepseek-v3.2`	DeepSeek	$0.14	$0.28	⭐8	Fast

Cost Tiers

Tier	Avg Cost	Models Used	Best For
`ultraCheap`	$0.0001	MiniMax, DeepSeek, Gemini Flash	Bulk, simple tasks
`cheap`	$0.0005	Gemini Flash, GPT-4o-mini, MiniMax	Standard Q&A
`balanced`	$0.003	Claude Sonnet, GPT-4o-mini, Gemini Flash	Coding, analysis
`premium`	$0.01	Claude Sonnet, GPT-4o, Gemini Pro	Research, strategy

Smart Routing

If you don't specify a model, Omnisphere analyzes your prompt complexity and auto-routes:

Short simple queries → ultraCheap tier
Analytical/comparison prompts → balanced tier
Code or long complex prompts → premium tier

The SmartRouter tracks provider health and automatically routes around failures. If Claude is down, queries go to GPT. If both are down, Gemini. If all direct APIs fail, OpenRouter is used as universal fallback.

Error Handling

// 401 — Bad or missing API key
{"error": {"message": "API key required", "type": "auth_error"}}

// 429 — Rate limit exceeded
{"error": {"message": "Rate limit exceeded (60/min)", "type": "rate_limit"}}

// 402 — Monthly budget exhausted
{"error": {"message": "Monthly budget exhausted ($1.00/$1.00)", "type": "budget_exceeded"}}