Multi-agent systems decompose complex tasks across multiple specialized agents, each optimized for a sub-problem. This mirrors how human organizations work: different specialists, coordinated by a manager, each doing what they're best at. The result is higher quality, greater speed through parallelism, and lower cost through model specialization.
Core multi-agent patterns
1. Orchestrator–Worker
A central orchestrator agent breaks down the task, delegates sub-tasks to specialized worker agents, and synthesizes their outputs into a final result. Workers don't know about each other — only the orchestrator does coordination.
2. Sequential Pipeline
Agents are arranged in a chain: each agent consumes the output of the previous one and passes its result forward. Optimized for workflows with clear stage dependencies: extract → transform → validate → format.
3. Parallel Fan-Out/Fan-In
An orchestrator fans tasks out to multiple agents running in parallel, then fan-in collects and merges their results. Dramatically reduces wall-clock time for tasks that can be parallelized.
4. Debate / Critic-Actor
One agent generates a response (actor); another critiques it (critic). The actor refines based on critique, iterating until the critic approves or a max round limit is hit. Improves output quality significantly on high-stakes tasks.
5. Hierarchical Agent Tree
Nested orchestration with multiple levels. A top-level orchestrator manages sub-orchestrators, each managing their own worker teams. Scales to very large, complex tasks with many concurrent sub-problems.
Implementation example: Orchestrator–Worker in MoltBot
Key design decisions
Model selection per agent role
Not every agent needs the most expensive model. Orchestrators benefit from Claude Opus 4 or GPT-5 for their reasoning quality. Workers doing retrieval, formatting, or simple extraction can use Gemini Flash or Claude Haiku at a fraction of the cost. This model-tier matching is one of the most impactful cost optimizations in multi-agent systems.
Context passing between agents
How you pass context between agents significantly affects quality. Three approaches:
- Full transcript pass: Each agent sees the complete output of previous agents. High quality, high token cost.
- Summary pass: A compression agent summarizes previous outputs before passing forward. Good balance of cost and quality.
- Structured output pass: Agents produce JSON/typed outputs that downstream agents consume structurally. Best for pipelines with clear schemas.
When to stop iterating
Critic-actor loops need stopping conditions to avoid infinite refinement. Common strategies: max rounds (typically 3–5), quality score threshold from a judge agent, or consensus between multiple critic agents. Build in hard limits — unbounded loops are expensive.
Build multi-agent systems on MoltBot
Native orchestration, parallel agent execution, configurable model routing. 14-day free trial.
Start Free Trial →