LLM Routing: How to Route Requests to the Right Model

The single biggest lever on AI infrastructure cost isn't negotiating enterprise contracts or compressing prompts — it's routing. Most teams send 100% of requests to their best model, even when 70% of those requests could be handled just as well by a model that costs 20× less.

Task complexity tiers

Tier	Task types	Recommended models	Relative cost
Tier 1 — Simple	Classification, intent detection, extraction, yes/no	Gemini Flash, Claude Haiku, GPT-4o Mini	1×
Tier 2 — Moderate	Summarization, drafting, data analysis, SQL generation	Claude Sonnet 4, GPT-4o, Gemini Pro	10–15×
Tier 3 — Complex	Multi-step reasoning, code architecture, long-form research, agent orchestration	Claude Opus 4, GPT-5, Gemini Ultra 2	50–100×

Routing strategies

Rule-based routing: Explicit rules — "if task_type == classification, use Haiku." Simple, predictable, zero overhead. Best when your task types are well-defined and stable.
Classifier-based routing: A lightweight model (or a simple ML classifier) predicts the complexity tier of each incoming request. Adds ~10ms overhead, handles edge cases better than rules.
Cost-aware cascade: Try a cheap model first; if the output fails a quality check, retry with a more capable model. Optimizes cost in the common case at the expense of latency on failures.
Semantic routing: Embed the request and compare to known task archetypes. Useful when tasks are diverse and hard to classify with simple rules.

MoltBot routing configuration

from moltbot import Router

router = Router(
    rules=[
        {"if": {"task_type": ["classify", "extract", "intent"]},
         "use": "claude-haiku-4"},
        {"if": {"task_type": ["summarize", "draft", "sql"]},
         "use": "claude-sonnet-4"},
        {"default": "claude-opus-4"},   # fallback for complex tasks
    ],
    fallback_on_error=True   # upgrade to next tier if model fails
)

result = router.run(task_type="classify", prompt="Is this spam?")
# Uses claude-haiku-4 — 95% cheaper than opus-4
      

Measuring routing effectiveness

Track cost-per-task-type and quality scores per tier. A well-tuned routing strategy typically routes 50–70% of traffic to Tier 1 models, 20–35% to Tier 2, and only 5–15% to Tier 3 frontier models. If your Tier 3 usage is above 30%, your routing is under-optimized.

Built-in multi-model routing on MoltBot

Rule-based and classifier routing, cost tracking per model, automatic fallback. 14-day free trial.

Start Free Trial →