Multi-Agent Systems

>>> Build systems where multiple specialized agents collaborate, communicate, and coordinate to solve complex problems.

What Are Multi-Agent Systems?

A multi-agent system splits work across multiple specialized agents instead of one monolithic agent doing everything.

Before diving in, make sure you understand single-agent fundamentals:

Related Guide →

Single-Agent Systems

Build production-ready AI agents with simple loops, good tools, and minimal complexity.

Why Build Multi-Agent Systems?

Benefits

• Specialization — A "research agent" is optimized differently than a "writer agent" or "compliance agent"

• Replaceability — Swap one agent without rewriting the whole app (upgrade translation, change model, move infra)

• Parallelism — Run independent subtasks concurrently for speed

• Separation of concerns — Keep prompts, tools, permissions, and guardrails isolated per agent

• Safer autonomy — Put risky actions behind a dedicated "approval" or "executor" agent

When Multi-Agent Systems Excel

According to Anthropic's research building Claude's Research feature:

Good Fit

• Breadth-first queries — Pursuing multiple independent directions simultaneously

• Tasks exceeding single context windows — Information that's too large for one agent

• Heavy parallelization — Many truly independent subtasks

• Complex tool interfaces — Different agents specialized for different tools

Poor Fit

• Tasks requiring shared context — All agents need the same information

• Many dependencies between agents — Real-time coordination is hard

• Low-value tasks — Multi-agent systems use ~15× more tokens than chat

• Most coding tasks — Fewer truly parallelizable subtasks than research

Common Multi-Agent Patterns

Architecture Patterns

Pattern	Description	Best For
Prompt Chaining	Agent A → Agent B → Agent C	Fixed steps, higher quality
Routing	Router classifies and forwards to specialists	Support, mixed intents
Parallelization	Fan-out to multiple agents, merge results	Speed + higher confidence
Hierarchical	Coordinator delegates to workers	Flexible task decomposition
Evaluator-Optimizer	One produces, another critiques/iterates	Quality-sensitive outputs

A single agent can internally perform these patterns—the orchestrator decides when to use each approach.

What Breaks Multi-Agent Systems

Common Failures

• No shared contract — Every agent uses different message formats and assumptions

• Tight coupling — Changing one agent breaks the whole pipeline

• Opaque debugging — Frameworks hide prompts/responses, making failures hard to inspect

• Tool confusion — Agents make mistakes when tools are underspecified

• Runaway spawning — Agents creating 50 subagents for simple queries

• Endless searching — Scouring the web for nonexistent sources

This is exactly why communication protocols matter—they provide a shared, minimal interface so agents interoperate like normal HTTP services.

The Agent Communication Protocol (ACP)

ACP is an open protocol that standardizes how agents discover and communicate with each other.

What ACP Standardizes

ACP Core Features

• Discovery — GET /agents to find available agents

• Invocation — POST /runs in sync, async, or streaming modes

• Message format — Parts with MIME types (multimodal-native)

• Long-running tasks — Async-first, sync supported

With ACP, "agent calls agent" becomes: one agent uses an HTTP client to invoke another agent's endpoint.

Why Protocols Matter

Benefits of ACP

• Composability — Agents become network components you can wire together

• Location independence — Move an agent to another machine/org/provider, only change the URL

• Language agnostic — Any language can implement the protocol

• Debuggability — Standard HTTP, standard message formats, easy to inspect

Example: Agent Calls Agent

This minimal example shows two specialist agents and one orchestrator that coordinates them via ACP.

Server: Two Specialists + Orchestrator

agent.py

import asyncio
from collections.abc import AsyncGenerator

from acp_sdk.server import Server, Context
from acp_sdk.client import Client
from acp_sdk.models import Message, MessagePart

server = Server()

# --- Specialist #1: Research ---
@server.agent(name="research")
async def research_agent(
  input: list[Message], 
  context: Context
) -> AsyncGenerator:
  """Returns research notes from the user's question."""
  question = input[0].parts[0].content
  # In production: call an LLM + tools here
  notes = [
      f"Key constraints for: {question}",
      "Common failure modes: vague scope, tool ambiguity",
      "Recommended: start simple, add composition when needed",
  ]
  yield Message(parts=[
      MessagePart(
          content="\n".join(f"- {n}" for n in notes),
          content_type="text/plain"
      )
  ])


# --- Specialist #2: Writer ---
@server.agent(name="writer")
async def writer_agent(
  input: list[Message], 
  context: Context
) -> AsyncGenerator:
  """Turns research notes into a short explanation."""
  notes = input[0].parts[0].content
  text = f"Summary:\n\n{notes}\n\nAdd guardrails for production."
  yield Message(parts=[
      MessagePart(content=text, content_type="text/plain")
  ])


# --- Helper: Call another agent via ACP ---
async def call_agent(base_url: str, agent: str, text: str):
  async with Client(base_url=base_url) as client:
      run = await client.run_sync(
          agent=agent,
          input=[Message(parts=[
              MessagePart(content=text, content_type="text/plain")
          ])],
      )
      return run.output


# --- Orchestrator: Coordinates specialists ---
@server.agent(name="assistant")
async def assistant(
  input: list[Message], 
  context: Context
) -> AsyncGenerator:
  """Calls research -> writer, returns final output."""
  base_url = "http://localhost:8000"
  user_text = input[0].parts[0].content

  # Step 1: Research
  research_out = await call_agent(base_url, "research", user_text)
  notes = research_out[0].parts[0].content

  # Step 2: Write
  writer_out = await call_agent(base_url, "writer", notes)
  final = writer_out[0].parts[0].content

  yield Message(parts=[
      MessagePart(content=final, content_type="text/plain")
  ])

server.run()

Client: Call the Orchestrator

client.py

import asyncio
from acp_sdk.client import Client
from acp_sdk.models import Message, MessagePart

async def main():
  async with Client(base_url="http://localhost:8000") as client:
      run = await client.run_sync(
          agent="assistant",
          input=[Message(parts=[
              MessagePart(
                  content="How do I design a multi-agent system?",
                  content_type="text/plain"
              )
          ])],
      )
      print(run.output[0].parts[0].content)

asyncio.run(main())

What This Demonstrates

Key Points

• Agents are composable network components

• The orchestrator doesn't know how specialists work internally

• Move research to another machine—only change base_url

• Standard HTTP, fully debuggable

Production Lessons from Anthropic

Anthropic's Research feature uses multiple Claude agents to explore complex topics. Their multi-agent system with Claude Opus 4 as lead and Claude Sonnet 4 subagents outperformed single-agent Opus by 90.2% on research evaluations.

Key Insights

What They Learned

• Token usage explains 80% of performance variance — Multi-agent architectures effectively scale token usage

• Upgrading models > doubling tokens — Claude Sonnet 4 gives larger gains than 2× budget on Sonnet 3.7

• Agents use ~4× more tokens than chat — Multi-agent uses ~15× more

• Parallel tool calling cut research time by 90% — For complex queries

Prompting Multi-Agent Systems

Prompting Principles

• Think like your agents — Build simulations to watch agents work step-by-step

• Teach delegation — Give subagents objectives, output formats, tool guidance, and task boundaries

• Scale effort to complexity — Simple tasks: 1 agent, 3-10 tool calls. Complex: 10+ subagents

• Start wide, then narrow — Broad queries first, then drill into specifics

• Guide the thinking process — Extended thinking improves instruction-following

What Breaks in Production

Production Challenges

• Errors compound — Minor failures cascade in stateful multi-turn agents

• Non-deterministic debugging — Same prompt, different paths each run

• Deployment coordination — Agents run continuously; updates can break in-progress work

• Synchronous bottlenecks — Waiting for one slow subagent blocks everything

Reliability Patterns

Production Hardening

• Resume from checkpoints — Don't restart from the beginning on errors

• Let agents handle failures — Tell the agent when tools fail; let it adapt

• Full production tracing — Log run_id, which agent called which, inputs/outputs

• Rainbow deployments — Gradually shift traffic while keeping old versions running

• End-state evaluation — Judge final outcomes, not intermediate steps

Designing for Multi-Agent Systems

Contracts Over Prompts

Treat agent outputs like APIs:

Define Contracts

• Output shape — Plain text vs structured JSON

• Done criteria — How the orchestrator knows a subagent finished

• Failure behavior — Retry? Fallback? Escalate to human?

Governance and Safety

Multi-agent systems get safer when you isolate permissions:

Permission Isolation

• Reader agents — Can fetch data, no mutations

• Executor agents — Can take actions, but limited scope

• Approval agents — Human gate for risky actions

Observability Over Cleverness

Log everything:

What to Log

{
"run_id": "abc123",
"parent_agent": "assistant",
"child_agent": "research",
"input_summary": "How to design multi-agent...",
"output_summary": "3 bullet points on constraints...",
"tool_calls": ["web_search x2"],
"tokens_used": 1847,
"duration_ms": 3200,
"stop_reason": "complete"
}

Minimize "Game of Telephone"

Output Management

• Direct filesystem writes — Subagents write to storage, pass references back

• Artifact systems — Structured outputs persist independently

• Lightweight handoffs — Don't copy large outputs through conversation history

Quick Reference

When to Use Each Pattern

Agents	Pattern	Example
1	Single agent loop	Research assistant, coding agent
2-3	Chain or router	Classify → specialize → respond
3-5	Parallel specialists	Multi-source research, consensus
5+	Hierarchical orchestration	Complex research, document analysis

Token Economics

System Type	Relative Token Usage
Chat (baseline)	1×
Single agent	~4×
Multi-agent	~15×

Summary

Building Multi-Agent Systems

Start with single agents — Only add agents when you prove you need them
Use standard protocols — ACP makes agents composable network services
Design contracts first — Define inputs, outputs, and failure modes
Isolate permissions — Separate readers, executors, and approvers
Log everything — Observability matters more than cleverness
Scale effort to complexity — Simple queries don't need 10 subagents

This guide draws on the Agent Communication Protocol (ACP) and Anthropic's engineering blog post "How we built our multi-agent research system" (June 2025).

Multi-Agent Systems

>>> Build systems where multiple specialized agents collaborate, communicate, and coordinate to solve complex problems.

What Are Multi-Agent Systems?

A multi-agent system splits work across multiple specialized agents instead of one monolithic agent doing everything.

Before diving in, make sure you understand single-agent fundamentals:

Related Guide →

Single-Agent Systems

Build production-ready AI agents with simple loops, good tools, and minimal complexity.

Why Build Multi-Agent Systems?

Benefits

• Specialization — A "research agent" is optimized differently than a "writer agent" or "compliance agent"

• Replaceability — Swap one agent without rewriting the whole app (upgrade translation, change model, move infra)

• Parallelism — Run independent subtasks concurrently for speed

• Separation of concerns — Keep prompts, tools, permissions, and guardrails isolated per agent

• Safer autonomy — Put risky actions behind a dedicated "approval" or "executor" agent

When Multi-Agent Systems Excel

According to Anthropic's research building Claude's Research feature:

Good Fit

• Breadth-first queries — Pursuing multiple independent directions simultaneously

• Tasks exceeding single context windows — Information that's too large for one agent

• Heavy parallelization — Many truly independent subtasks

• Complex tool interfaces — Different agents specialized for different tools

Poor Fit

• Tasks requiring shared context — All agents need the same information

• Many dependencies between agents — Real-time coordination is hard

• Low-value tasks — Multi-agent systems use ~15× more tokens than chat

• Most coding tasks — Fewer truly parallelizable subtasks than research

Common Multi-Agent Patterns

Architecture Patterns

Pattern	Description	Best For
Prompt Chaining	Agent A → Agent B → Agent C	Fixed steps, higher quality
Routing	Router classifies and forwards to specialists	Support, mixed intents
Parallelization	Fan-out to multiple agents, merge results	Speed + higher confidence
Hierarchical	Coordinator delegates to workers	Flexible task decomposition
Evaluator-Optimizer	One produces, another critiques/iterates	Quality-sensitive outputs

A single agent can internally perform these patterns—the orchestrator decides when to use each approach.

What Breaks Multi-Agent Systems

Common Failures

• No shared contract — Every agent uses different message formats and assumptions

• Tight coupling — Changing one agent breaks the whole pipeline

• Opaque debugging — Frameworks hide prompts/responses, making failures hard to inspect

• Tool confusion — Agents make mistakes when tools are underspecified

• Runaway spawning — Agents creating 50 subagents for simple queries

• Endless searching — Scouring the web for nonexistent sources

This is exactly why communication protocols matter—they provide a shared, minimal interface so agents interoperate like normal HTTP services.

The Agent Communication Protocol (ACP)

ACP is an open protocol that standardizes how agents discover and communicate with each other.

What ACP Standardizes

ACP Core Features

• Discovery — GET /agents to find available agents

• Invocation — POST /runs in sync, async, or streaming modes

• Message format — Parts with MIME types (multimodal-native)

• Long-running tasks — Async-first, sync supported

With ACP, "agent calls agent" becomes: one agent uses an HTTP client to invoke another agent's endpoint.

Why Protocols Matter

Benefits of ACP

• Composability — Agents become network components you can wire together

• Location independence — Move an agent to another machine/org/provider, only change the URL

• Language agnostic — Any language can implement the protocol

• Debuggability — Standard HTTP, standard message formats, easy to inspect

Example: Agent Calls Agent

This minimal example shows two specialist agents and one orchestrator that coordinates them via ACP.

Server: Two Specialists + Orchestrator

agent.py

import asyncio
from collections.abc import AsyncGenerator

from acp_sdk.server import Server, Context
from acp_sdk.client import Client
from acp_sdk.models import Message, MessagePart

server = Server()

# --- Specialist #1: Research ---
@server.agent(name="research")
async def research_agent(
  input: list[Message], 
  context: Context
) -> AsyncGenerator:
  """Returns research notes from the user's question."""
  question = input[0].parts[0].content
  # In production: call an LLM + tools here
  notes = [
      f"Key constraints for: {question}",
      "Common failure modes: vague scope, tool ambiguity",
      "Recommended: start simple, add composition when needed",
  ]
  yield Message(parts=[
      MessagePart(
          content="\n".join(f"- {n}" for n in notes),
          content_type="text/plain"
      )
  ])


# --- Specialist #2: Writer ---
@server.agent(name="writer")
async def writer_agent(
  input: list[Message], 
  context: Context
) -> AsyncGenerator:
  """Turns research notes into a short explanation."""
  notes = input[0].parts[0].content
  text = f"Summary:\n\n{notes}\n\nAdd guardrails for production."
  yield Message(parts=[
      MessagePart(content=text, content_type="text/plain")
  ])


# --- Helper: Call another agent via ACP ---
async def call_agent(base_url: str, agent: str, text: str):
  async with Client(base_url=base_url) as client:
      run = await client.run_sync(
          agent=agent,
          input=[Message(parts=[
              MessagePart(content=text, content_type="text/plain")
          ])],
      )
      return run.output


# --- Orchestrator: Coordinates specialists ---
@server.agent(name="assistant")
async def assistant(
  input: list[Message], 
  context: Context
) -> AsyncGenerator:
  """Calls research -> writer, returns final output."""
  base_url = "http://localhost:8000"
  user_text = input[0].parts[0].content

  # Step 1: Research
  research_out = await call_agent(base_url, "research", user_text)
  notes = research_out[0].parts[0].content

  # Step 2: Write
  writer_out = await call_agent(base_url, "writer", notes)
  final = writer_out[0].parts[0].content

  yield Message(parts=[
      MessagePart(content=final, content_type="text/plain")
  ])

server.run()

Client: Call the Orchestrator

client.py

import asyncio
from acp_sdk.client import Client
from acp_sdk.models import Message, MessagePart

async def main():
  async with Client(base_url="http://localhost:8000") as client:
      run = await client.run_sync(
          agent="assistant",
          input=[Message(parts=[
              MessagePart(
                  content="How do I design a multi-agent system?",
                  content_type="text/plain"
              )
          ])],
      )
      print(run.output[0].parts[0].content)

asyncio.run(main())

What This Demonstrates

Key Points

• Agents are composable network components

• The orchestrator doesn't know how specialists work internally

• Move research to another machine—only change base_url

• Standard HTTP, fully debuggable

Production Lessons from Anthropic

Key Insights

What They Learned

• Token usage explains 80% of performance variance — Multi-agent architectures effectively scale token usage

• Upgrading models > doubling tokens — Claude Sonnet 4 gives larger gains than 2× budget on Sonnet 3.7

• Agents use ~4× more tokens than chat — Multi-agent uses ~15× more

• Parallel tool calling cut research time by 90% — For complex queries

Prompting Multi-Agent Systems

Prompting Principles

• Think like your agents — Build simulations to watch agents work step-by-step

• Teach delegation — Give subagents objectives, output formats, tool guidance, and task boundaries

• Scale effort to complexity — Simple tasks: 1 agent, 3-10 tool calls. Complex: 10+ subagents

• Start wide, then narrow — Broad queries first, then drill into specifics

• Guide the thinking process — Extended thinking improves instruction-following

What Breaks in Production

Production Challenges

• Errors compound — Minor failures cascade in stateful multi-turn agents

• Non-deterministic debugging — Same prompt, different paths each run

• Deployment coordination — Agents run continuously; updates can break in-progress work

• Synchronous bottlenecks — Waiting for one slow subagent blocks everything

Reliability Patterns

Production Hardening

• Resume from checkpoints — Don't restart from the beginning on errors

• Let agents handle failures — Tell the agent when tools fail; let it adapt

• Full production tracing — Log run_id, which agent called which, inputs/outputs

• Rainbow deployments — Gradually shift traffic while keeping old versions running

• End-state evaluation — Judge final outcomes, not intermediate steps

Designing for Multi-Agent Systems

Contracts Over Prompts

Treat agent outputs like APIs:

Define Contracts

• Output shape — Plain text vs structured JSON

• Done criteria — How the orchestrator knows a subagent finished

• Failure behavior — Retry? Fallback? Escalate to human?

Governance and Safety

Multi-agent systems get safer when you isolate permissions:

Permission Isolation

• Reader agents — Can fetch data, no mutations

• Executor agents — Can take actions, but limited scope

• Approval agents — Human gate for risky actions

Observability Over Cleverness

Log everything:

What to Log

{
"run_id": "abc123",
"parent_agent": "assistant",
"child_agent": "research",
"input_summary": "How to design multi-agent...",
"output_summary": "3 bullet points on constraints...",
"tool_calls": ["web_search x2"],
"tokens_used": 1847,
"duration_ms": 3200,
"stop_reason": "complete"
}

Minimize "Game of Telephone"

Output Management

• Direct filesystem writes — Subagents write to storage, pass references back

• Artifact systems — Structured outputs persist independently

• Lightweight handoffs — Don't copy large outputs through conversation history

Quick Reference

When to Use Each Pattern

Agents	Pattern	Example
1	Single agent loop	Research assistant, coding agent
2-3	Chain or router	Classify → specialize → respond
3-5	Parallel specialists	Multi-source research, consensus
5+	Hierarchical orchestration	Complex research, document analysis

Token Economics

System Type	Relative Token Usage
Chat (baseline)	1×
Single agent	~4×
Multi-agent	~15×

Summary

Building Multi-Agent Systems

Start with single agents — Only add agents when you prove you need them
Use standard protocols — ACP makes agents composable network services
Design contracts first — Define inputs, outputs, and failure modes
Isolate permissions — Separate readers, executors, and approvers
Log everything — Observability matters more than cleverness
Scale effort to complexity — Simple queries don't need 10 subagents

This guide draws on the Agent Communication Protocol (ACP) and Anthropic's engineering blog post "How we built our multi-agent research system" (June 2025).