Photo by Markus Spiske on Unsplash Source

Most multi-agent system failures are architecture problems disguised as model problems. Teams throw better prompts at agents that keep dropping context, retrying tasks, or producing contradictory outputs, when the real issue is that they picked the wrong coordination pattern for their workload. Google, O’Reilly, and LangChain all published multi-agent architecture guides in the past twelve months, and each one converges on the same uncomfortable conclusion: the pattern you choose matters more than the model you run.

Research papers on multi-agent systems jumped from 820 in 2024 to over 2,500 in 2025, according to O’Reilly’s radar report. Gartner documented a 1,445% surge in multi-agent system inquiries from Q1 2024 to Q2 2025. The academic interest is there. The production maturity is not. This post pulls the practical decision framework out of the noise.

Related: Multi-Agent Orchestration: How AI Agents Work Together

Google’s Eight Design Patterns, Ranked by When You Actually Need Them

Google published a guide to multi-agent design patterns built around their Agent Development Kit (ADK). Eight patterns, each with sample code. The list is comprehensive, but not every pattern deserves equal attention. Here is how they stack up in production use.

The Three Patterns That Handle 80% of Use Cases

Sequential Pipeline is the starting point for most teams. Agents are arranged like an assembly line: Agent A’s output feeds Agent B, whose output feeds Agent C. No branching, no parallelism. It works when your tasks have strict dependencies. A compliance review pipeline, for example, where a document extraction agent passes structured data to a regulation-matching agent, which passes flagged clauses to a risk-scoring agent. Google’s guide calls this pattern “linear, deterministic, and refreshingly easy to debug because you always know exactly where the data came from.”

Coordinator/Router adds a decision layer. One agent receives all incoming requests and dispatches them to specialists. A customer service system routes billing questions to one agent, technical issues to another, and account management to a third. The coordinator maintains session context and synthesizes results. This pattern works well when you know your task categories upfront but not which one any given input will require.

Parallel Fan-Out sends the same input to multiple agents simultaneously. A research system queries academic databases, news sources, and internal knowledge bases at the same time, then an aggregator merges the results. Use this when your subtasks are independent and the bottleneck is latency, not compute.

The Patterns You Should Postpone

The remaining five patterns (hierarchical delegation, consensus-based voting, competitive evaluation, human-in-the-loop checkpoints, and dynamic agent spawning) solve real problems, but they add coordination complexity that most teams are not ready for. Hierarchical delegation, where a supervisor manages sub-supervisors who manage workers, makes sense when you have 10+ agents with distinct authority levels. Consensus voting, where multiple agents independently solve the same problem and a judge picks the best output, makes sense when accuracy matters more than cost. For most production systems in 2026, these are optimizations, not starting points.

Supervisor vs Swarm: LangChain’s Benchmarking Data

LangChain published benchmarking results comparing two of the most common multi-agent patterns in LangGraph: supervisor and swarm. The numbers settle a debate that has mostly been driven by intuition.

How They Differ

In the supervisor pattern, a central agent receives every request, decomposes it into subtasks, delegates each subtask to a specialized worker agent, and synthesizes the final response. Workers never talk to each other or respond to the user directly. Everything flows through the supervisor.

In the swarm pattern, every agent can hand off control to any other agent in the group. There is no central coordinator. When an agent decides its part is done, it passes the conversation to whichever agent it thinks should go next. The active agent responds directly to the user.

The Performance Gap

The supervisor pattern consistently uses more tokens than the swarm. LangChain’s analysis traces this to the “translation overhead”: since sub-agents in the supervisor pattern cannot respond to the user directly, the supervisor must reprocess and reformat every worker’s output. The swarm pattern, by cutting out this intermediary, achieved roughly a 40% reduction in end-to-end response time while also reducing the number of LLM calls per query.

That does not make swarm universally better. The supervisor pattern provides stronger guarantees around output quality, error recovery, and task completion verification. If you need to guarantee that all subtasks finish before returning a result, or if you need a single point of audit for compliance, the supervisor’s overhead is the price of control.

The Decision Rule

Use supervisor when the workflow has a defined goal and you need reliability: report generation, multi-step data analysis, regulated processes. Use swarm when the path to a solution is unknown and agents need to dynamically discover who should handle what: open-ended research, creative brainstorming, exploratory troubleshooting. LangChain’s recommendation is blunt: start from your goals and constraints, not from which pattern sounds more elegant.

Related: AI Agent Frameworks Compared: LangGraph, CrewAI, AutoGen

Why Hybrid Patterns Dominate Production Systems

Neither pure supervisor nor pure swarm matches how most production systems actually work. O’Reilly’s architecture guide describes the pattern that experienced teams converge on: a small number of fast specialist agents operate in parallel, while a slower, more deliberate orchestrator periodically aggregates results, checks assumptions, and decides whether the system should continue or stop.

What Hybrid Looks Like in Practice

Consider an enterprise due diligence system. Three specialist agents run in parallel: one pulls financial filings, one scans regulatory databases, one searches news archives. Each operates independently with its own tools and context. A fourth agent, the orchestrator, runs on a longer cycle. Every 30 seconds, it reviews what the specialists have found, checks for contradictions between sources, identifies gaps that need additional research, and decides whether the overall analysis has enough coverage to produce a report.

This is not supervisor (the orchestrator does not dispatch every task) and not swarm (the specialists do not hand off to each other). It is a hybrid that balances throughput with quality control. The specialists maximize speed because they run in parallel without waiting for permissions. The orchestrator prevents errors from compounding unchecked because it periodically validates the aggregate state.

The Prompting Fallacy

O’Reilly’s guide names a pattern it calls the “prompting fallacy”: when agents underperform, most teams instinctively reach for better prompts. Rewrite the system prompt. Add more few-shot examples. Tweak the temperature. The guide’s position is direct: you cannot prompt your way out of a system-level failure. If three agents are producing contradictory results, the fix is usually a coordination change (add a validation step, change the information flow, introduce a reconciliation agent), not a prompt change.

This matches what Google’s scaling principles emphasize: most “agent failures” are coordination and context-transfer issues at handoff points, not model capability failures.

The Four Questions That Pick Your Pattern

After synthesizing all three guides, a decision framework emerges. Answer these four questions and the architecture pattern follows.

1. Are Your Subtasks Independent?

If yes, use parallel fan-out. If subtasks have strict dependencies (output of A is required input for B), use a sequential pipeline. If some are independent and some depend on earlier results, you need a hybrid.

2. Do You Know Your Task Categories Upfront?

If you can enumerate every type of request your system will handle, use a coordinator/router. If requests are open-ended and you cannot predict what capabilities will be needed, a swarm handles the ambiguity better because agents can dynamically hand off based on runtime conditions.

3. How Important Is Auditability?

Regulated industries (finance, healthcare, legal) typically need a supervisor pattern because every decision flows through a single point that can be logged, reviewed, and audited. Swarm patterns, where control passes unpredictably between agents, are harder to audit but faster to iterate on.

4. How Many Agents Do You Actually Need?

Google’s research and the multi-agent trap analysis show performance degradation of 39-70% when teams add agents beyond what the task requires. Before choosing a multi-agent pattern, verify that a single agent with good tool access cannot solve the problem. O’Reilly’s definitive guide frames it as an “agents-as-teams” mindset: most production failures are coordination problems, not capability problems. Adding more agents to a coordination problem makes it worse.

Related: The Multi-Agent Trap: When Adding More AI Agents Makes Everything Worse

Protocol Layer: A2A and MCP as Architecture Enablers

The architecture pattern you choose constrains which communication protocols work. Google’s Agent-to-Agent (A2A) protocol and Anthropic’s Model Context Protocol (MCP) are not competing standards; they solve different layers of the stack.

MCP standardizes how agents connect to tools and data sources. A2A standardizes how agents communicate with each other. A supervisor pattern needs A2A (or something like it) for the supervisor to delegate tasks and collect results from workers. Every pattern needs MCP (or equivalent) for agents to access external tools and databases.

The practical implication: if you are building a multi-agent system in 2026, your architecture needs to account for both inter-agent communication and agent-to-tool communication as separate concerns. Conflating them is a common source of complexity that does not need to exist.

Related: MCP and A2A: The Protocol Layer for AI Agents

Frequently Asked Questions

What is the best multi-agent architecture pattern?

There is no single best pattern. Sequential pipelines work for tasks with strict dependencies, coordinator/router patterns handle known task categories, parallel fan-out maximizes throughput for independent tasks, and hybrid patterns that combine parallel specialists with a periodic orchestrator dominate most production systems. The right pattern depends on whether your subtasks are independent, whether you know task categories upfront, and how important auditability is.

What is the difference between supervisor and swarm multi-agent patterns?

In the supervisor pattern, a central agent delegates all tasks and synthesizes all results. Workers never respond directly to users. In the swarm pattern, any agent can hand off to any other agent, and the active agent responds directly. LangChain benchmarks show swarm uses roughly 40% fewer tokens and is faster, but supervisor provides stronger guarantees for reliability, error recovery, and auditability.

How many agents should a multi-agent system have?

As few as possible. Google’s research shows 39-70% performance degradation when teams add agents beyond what the task requires. Before designing a multi-agent system, verify that a single agent with good tool access cannot solve the problem. Most production failures are coordination problems, not capability problems, and adding more agents to a coordination problem makes it worse.

What is the prompting fallacy in multi-agent systems?

The prompting fallacy is the instinct to fix multi-agent system failures by rewriting prompts. O’Reilly’s architecture guide argues that you cannot prompt your way out of a system-level failure. When agents produce contradictory results or drop context, the fix is usually a coordination change (adding validation steps, changing information flow), not a prompt change.

What is the difference between A2A and MCP protocols for multi-agent systems?

A2A (Agent-to-Agent) standardizes how agents communicate with each other. MCP (Model Context Protocol) standardizes how agents connect to tools and data sources. They solve different layers: A2A handles inter-agent communication while MCP handles agent-to-tool communication. Most multi-agent architectures need both.