
Multi-Agent Systems
>>> Build systems where multiple specialized agents collaborate, communicate, and coordinate to solve complex problems.
What Are Multi-Agent Systems?
A multi-agent system splits work across multiple specialized agents instead of one monolithic agent doing everything.
Before diving in, make sure you understand single-agent fundamentals:
Single-Agent Systems
Build production-ready AI agents with simple loops, good tools, and minimal complexity.
Why Build Multi-Agent Systems?
• Specialization — A "research agent" is optimized differently than a "writer agent" or "compliance agent"
• Replaceability — Swap one agent without rewriting the whole app (upgrade translation, change model, move infra)
• Parallelism — Run independent subtasks concurrently for speed
• Separation of concerns — Keep prompts, tools, permissions, and guardrails isolated per agent
• Safer autonomy — Put risky actions behind a dedicated "approval" or "executor" agent
When Multi-Agent Systems Excel
According to Anthropic's research building Claude's Research feature:
• Breadth-first queries — Pursuing multiple independent directions simultaneously
• Tasks exceeding single context windows — Information that's too large for one agent
• Heavy parallelization — Many truly independent subtasks
• Complex tool interfaces — Different agents specialized for different tools
• Tasks requiring shared context — All agents need the same information
• Many dependencies between agents — Real-time coordination is hard
• Low-value tasks — Multi-agent systems use ~15× more tokens than chat
• Most coding tasks — Fewer truly parallelizable subtasks than research
Common Multi-Agent Patterns
| Pattern | Description | Best For |
|---|---|---|
| Prompt Chaining | Agent A → Agent B → Agent C | Fixed steps, higher quality |
| Routing | Router classifies and forwards to specialists | Support, mixed intents |
| Parallelization | Fan-out to multiple agents, merge results | Speed + higher confidence |
| Hierarchical | Coordinator delegates to workers | Flexible task decomposition |
| Evaluator-Optimizer | One produces, another critiques/iterates | Quality-sensitive outputs |
A single agent can internally perform these patterns—the orchestrator decides when to use each approach.
What Breaks Multi-Agent Systems
• No shared contract — Every agent uses different message formats and assumptions
• Tight coupling — Changing one agent breaks the whole pipeline
• Opaque debugging — Frameworks hide prompts/responses, making failures hard to inspect
• Tool confusion — Agents make mistakes when tools are underspecified
• Runaway spawning — Agents creating 50 subagents for simple queries
• Endless searching — Scouring the web for nonexistent sources
This is exactly why communication protocols matter—they provide a shared, minimal interface so agents interoperate like normal HTTP services.
The Agent Communication Protocol (ACP)
ACP is an open protocol that standardizes how agents discover and communicate with each other.
What ACP Standardizes
• Discovery — GET /agents to find available agents
• Invocation — POST /runs in sync, async, or streaming modes
• Message format — Parts with MIME types (multimodal-native)
• Long-running tasks — Async-first, sync supported
With ACP, "agent calls agent" becomes: one agent uses an HTTP client to invoke another agent's endpoint.
Why Protocols Matter
• Composability — Agents become network components you can wire together
• Location independence — Move an agent to another machine/org/provider, only change the URL
• Language agnostic — Any language can implement the protocol
• Debuggability — Standard HTTP, standard message formats, easy to inspect
Example: Agent Calls Agent
This minimal example shows two specialist agents and one orchestrator that coordinates them via ACP.
Server: Two Specialists + Orchestrator
import asyncio
from collections.abc import AsyncGenerator
from acp_sdk.server import Server, Context
from acp_sdk.client import Client
from acp_sdk.models import Message, MessagePart
server = Server()
# --- Specialist #1: Research ---
@server.agent(name="research")
async def research_agent(
input: list[Message],
context: Context
) -> AsyncGenerator:
"""Returns research notes from the user's question."""
question = input[0].parts[0].content
# In production: call an LLM + tools here
notes = [
f"Key constraints for: {question}",
"Common failure modes: vague scope, tool ambiguity",
"Recommended: start simple, add composition when needed",
]
yield Message(parts=[
MessagePart(
content="\n".join(f"- {n}" for n in notes),
content_type="text/plain"
)
])
# --- Specialist #2: Writer ---
@server.agent(name="writer")
async def writer_agent(
input: list[Message],
context: Context
) -> AsyncGenerator:
"""Turns research notes into a short explanation."""
notes = input[0].parts[0].content
text = f"Summary:\n\n{notes}\n\nAdd guardrails for production."
yield Message(parts=[
MessagePart(content=text, content_type="text/plain")
])
# --- Helper: Call another agent via ACP ---
async def call_agent(base_url: str, agent: str, text: str):
async with Client(base_url=base_url) as client:
run = await client.run_sync(
agent=agent,
input=[Message(parts=[
MessagePart(content=text, content_type="text/plain")
])],
)
return run.output
# --- Orchestrator: Coordinates specialists ---
@server.agent(name="assistant")
async def assistant(
input: list[Message],
context: Context
) -> AsyncGenerator:
"""Calls research -> writer, returns final output."""
base_url = "http://localhost:8000"
user_text = input[0].parts[0].content
# Step 1: Research
research_out = await call_agent(base_url, "research", user_text)
notes = research_out[0].parts[0].content
# Step 2: Write
writer_out = await call_agent(base_url, "writer", notes)
final = writer_out[0].parts[0].content
yield Message(parts=[
MessagePart(content=final, content_type="text/plain")
])
server.run()Client: Call the Orchestrator
import asyncio
from acp_sdk.client import Client
from acp_sdk.models import Message, MessagePart
async def main():
async with Client(base_url="http://localhost:8000") as client:
run = await client.run_sync(
agent="assistant",
input=[Message(parts=[
MessagePart(
content="How do I design a multi-agent system?",
content_type="text/plain"
)
])],
)
print(run.output[0].parts[0].content)
asyncio.run(main())What This Demonstrates
• Agents are composable network components
• The orchestrator doesn't know how specialists work internally
• Move research to another machine—only change base_url
• Standard HTTP, fully debuggable
Production Lessons from Anthropic
Anthropic's Research feature uses multiple Claude agents to explore complex topics. Their multi-agent system with Claude Opus 4 as lead and Claude Sonnet 4 subagents outperformed single-agent Opus by 90.2% on research evaluations.
Key Insights
• Token usage explains 80% of performance variance — Multi-agent architectures effectively scale token usage
• Upgrading models > doubling tokens — Claude Sonnet 4 gives larger gains than 2× budget on Sonnet 3.7
• Agents use ~4× more tokens than chat — Multi-agent uses ~15× more
• Parallel tool calling cut research time by 90% — For complex queries
Prompting Multi-Agent Systems
• Think like your agents — Build simulations to watch agents work step-by-step
• Teach delegation — Give subagents objectives, output formats, tool guidance, and task boundaries
• Scale effort to complexity — Simple tasks: 1 agent, 3-10 tool calls. Complex: 10+ subagents
• Start wide, then narrow — Broad queries first, then drill into specifics
• Guide the thinking process — Extended thinking improves instruction-following
What Breaks in Production
• Errors compound — Minor failures cascade in stateful multi-turn agents
• Non-deterministic debugging — Same prompt, different paths each run
• Deployment coordination — Agents run continuously; updates can break in-progress work
• Synchronous bottlenecks — Waiting for one slow subagent blocks everything
Reliability Patterns
• Resume from checkpoints — Don't restart from the beginning on errors
• Let agents handle failures — Tell the agent when tools fail; let it adapt
• Full production tracing — Log run_id, which agent called which, inputs/outputs
• Rainbow deployments — Gradually shift traffic while keeping old versions running
• End-state evaluation — Judge final outcomes, not intermediate steps
Designing for Multi-Agent Systems
Contracts Over Prompts
Treat agent outputs like APIs:
• Output shape — Plain text vs structured JSON
• Done criteria — How the orchestrator knows a subagent finished
• Failure behavior — Retry? Fallback? Escalate to human?
Governance and Safety
Multi-agent systems get safer when you isolate permissions:
• Reader agents — Can fetch data, no mutations
• Executor agents — Can take actions, but limited scope
• Approval agents — Human gate for risky actions
Observability Over Cleverness
Log everything:
{
"run_id": "abc123",
"parent_agent": "assistant",
"child_agent": "research",
"input_summary": "How to design multi-agent...",
"output_summary": "3 bullet points on constraints...",
"tool_calls": ["web_search x2"],
"tokens_used": 1847,
"duration_ms": 3200,
"stop_reason": "complete"
}Minimize "Game of Telephone"
• Direct filesystem writes — Subagents write to storage, pass references back
• Artifact systems — Structured outputs persist independently
• Lightweight handoffs — Don't copy large outputs through conversation history
Quick Reference
| Agents | Pattern | Example |
|---|---|---|
| 1 | Single agent loop | Research assistant, coding agent |
| 2-3 | Chain or router | Classify → specialize → respond |
| 3-5 | Parallel specialists | Multi-source research, consensus |
| 5+ | Hierarchical orchestration | Complex research, document analysis |
| System Type | Relative Token Usage |
|---|---|
| Chat (baseline) | 1× |
| Single agent | ~4× |
| Multi-agent | ~15× |
Summary
-
Start with single agents — Only add agents when you prove you need them
-
Use standard protocols — ACP makes agents composable network services
-
Design contracts first — Define inputs, outputs, and failure modes
-
Isolate permissions — Separate readers, executors, and approvers
-
Log everything — Observability matters more than cleverness
-
Scale effort to complexity — Simple queries don't need 10 subagents
This guide draws on the Agent Communication Protocol (ACP) and Anthropic's engineering blog post "How we built our multi-agent research system" (June 2025).