The single biggest lever on AI infrastructure cost isn't negotiating enterprise contracts or compressing prompts โ it's routing. Most teams send 100% of requests to their best model, even when 70% of those requests could be handled just as well by a model that costs 20ร less.
Task complexity tiers
| Tier | Task types | Recommended models | Relative cost |
|---|---|---|---|
| Tier 1 โ Simple | Classification, intent detection, extraction, yes/no | Gemini Flash, Claude Haiku, GPT-4o Mini | 1ร |
| Tier 2 โ Moderate | Summarization, drafting, data analysis, SQL generation | Claude Sonnet 4, GPT-4o, Gemini Pro | 10โ15ร |
| Tier 3 โ Complex | Multi-step reasoning, code architecture, long-form research, agent orchestration | Claude Opus 4, GPT-5, Gemini Ultra 2 | 50โ100ร |
Routing strategies
- Rule-based routing: Explicit rules โ "if task_type == classification, use Haiku." Simple, predictable, zero overhead. Best when your task types are well-defined and stable.
- Classifier-based routing: A lightweight model (or a simple ML classifier) predicts the complexity tier of each incoming request. Adds ~10ms overhead, handles edge cases better than rules.
- Cost-aware cascade: Try a cheap model first; if the output fails a quality check, retry with a more capable model. Optimizes cost in the common case at the expense of latency on failures.
- Semantic routing: Embed the request and compare to known task archetypes. Useful when tasks are diverse and hard to classify with simple rules.
MoltBot routing configuration
Measuring routing effectiveness
Track cost-per-task-type and quality scores per tier. A well-tuned routing strategy typically routes 50โ70% of traffic to Tier 1 models, 20โ35% to Tier 2, and only 5โ15% to Tier 3 frontier models. If your Tier 3 usage is above 30%, your routing is under-optimized.
Built-in multi-model routing on MoltBot
Rule-based and classifier routing, cost tracking per model, automatic fallback. 14-day free trial.
Start Free Trial โ