Claude Opus 4.6 Agent Teams: Multi-Agent Orchestration Inside Your Terminal

Q: How do I enable Claude Code Agent Teams?

Set the CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS environment variable to 1 in your shell or add it to your Claude Code settings.json. Then describe the task and team structure in natural language. Agent Teams are currently a research preview and disabled by default.

Photo by Jordan Harrison on Unsplash Source

Sixteen Claude instances, running in parallel for two weeks across nearly 2,000 sessions, produced a 100,000-line Rust C compiler that compiles the Linux kernel on x86, ARM, and RISC-V. Total cost: under $20,000. That project was not a product demo. It was Anthropic’s stress test for Agent Teams, the multi-agent orchestration feature that shipped with Claude Opus 4.6 on February 5, 2026.

Agent Teams turn Claude Code from a single-session coding assistant into a coordination layer for parallel AI work. You describe a task. A lead session breaks it down, spawns teammates, and distributes the pieces. Each teammate runs in its own context window, claims tasks from a shared list, and messages other teammates directly. No human routing required.

How Agent Teams Actually Work

The architecture is straightforward: one lead, multiple teammates, a shared task list, and a mailbox system for inter-agent communication. But the details matter more than the diagram.

The Lead-Teammate Model

When you ask Claude Code to create an agent team, the session you are in becomes the lead. The lead’s job is coordination: spawning teammates, assigning tasks, synthesizing results, and deciding when work is done. Teammates are independent Claude Code instances, each with their own context window, tool access, and permissions. They load the same project context (CLAUDE.md, MCP servers, skills) but do not inherit the lead’s conversation history.

This separation is intentional. A teammate researching your authentication module does not need to know about the unrelated database migration the lead discussed three turns ago. Clean context means better focus.

The lead can assign tasks explicitly (“give the login refactor to teammate-2”) or let teammates self-claim from the shared list. Task claiming uses file locking to prevent race conditions when multiple teammates grab for the same item. Tasks also support dependencies: a teammate cannot start on the API integration until the schema migration task is marked complete.

Communication Without Bottlenecks

The critical difference between Agent Teams and subagents is communication topology. Subagents report results back to the main agent. That is it. They cannot talk to each other, share intermediate findings, or challenge each other’s conclusions. The main agent is the bottleneck.

Agent Teams remove that constraint. Teammates message each other directly, broadcast to the entire team, and read the shared task list to understand what everyone else is working on. When a teammate finishes, it automatically notifies the lead. When a task completes, dependent tasks unblock without manual intervention.

This matters for debugging. If teammate-1 finds a race condition in the event loop and teammate-2 is investigating memory leaks in the same module, they can share findings in real time instead of waiting for the lead to relay information back and forth.

Display Modes and Direct Interaction

Agent Teams support two display modes. In-process mode runs all teammates inside your main terminal. Use Shift+Up/Down to select a teammate and type to message them directly. Split-pane mode gives each teammate its own tmux or iTerm2 pane, so you can watch everyone’s output simultaneously.

You can also toggle the lead into delegate mode (Shift+Tab), which restricts it to coordination-only tools: spawning, messaging, shutting down teammates, and managing tasks. This prevents the lead from implementing tasks itself instead of distributing them.

The C Compiler: What 16 Agents Actually Did

Anthropic researcher Nicholas Carlini did not just test Agent Teams on toy problems. He ran 16 parallel Claude instances for two weeks to build a C compiler from scratch in Rust, with no internet access and no human-written code.

The Numbers

The project consumed 2 billion input tokens and generated 140 million output tokens across nearly 2,000 Claude Code sessions. The resulting compiler, called claudes-c-compiler, is open source on GitHub and targets x86 (64-bit and 32-bit), ARM, and RISC-V. It uses SSA-based intermediate representation and depends only on the Rust standard library.

What it compiles: Linux 6.9 (booting kernel), PostgreSQL (all 237 regression tests pass), SQLite, Redis, QEMU, FFmpeg, GNU coreutils, CPython, and over 150 additional projects. It achieves a 99% pass rate on the GCC torture test suite.

Coordination Lessons

The agents coordinated through Git, not through Agent Teams’ built-in task system. Each agent claimed tasks by writing files to a current_tasks/ directory. Git synchronization prevented duplicate work. Merge conflicts were handled autonomously.

Three lessons from the experiment stand out:

Test quality trumps everything. Carlini noted: “Claude will work autonomously to solve whatever problem I give it. So it is important that the task verifier is nearly perfect, otherwise Claude will solve the wrong problem.” The agents were only as good as the tests they optimized against.

Context pollution kills productivity. Agents that dumped verbose output into their context windows degraded faster. Concise logging and aggregate statistics, rather than raw test output, kept context windows clean and agents effective.

Time blindness is real. Without a --fast mode that used deterministic sampling per agent, individual agents would spend hours on test runs that should have taken minutes. The fix: set explicit time budgets and use compiler oracles to partition work when monolithic tasks (like full kernel compilation) blocked progress.

Agent Teams vs. Subagents vs. Everything Else

The multi-agent coding space got crowded fast. Understanding where Agent Teams fit requires comparing them to three alternatives.

Subagents: Same Session, Less Overhead

Subagents run inside your current Claude Code session. They get their own context window, execute a focused task, and return a summary to the main agent. They cannot message each other. They cannot self-coordinate.

Use subagents when you need a quick researcher or validator: “Go check if this API returns pagination headers” or “Grep the codebase for deprecated function calls.” The token cost is lower because only the summary returns to the main context.

Use Agent Teams when the work requires coordination: parallel debugging with competing hypotheses, cross-layer changes (frontend, backend, tests), or tasks where teammates need to challenge each other’s findings.

GPT-5.3-Codex: Interactive Steering

OpenAI’s approach with GPT-5.3-Codex is philosophically different. Instead of multiple agents coordinating autonomously, Codex gives you a single agent you steer interactively. You can redirect it mid-execution without breaking context. The Codex app manages multiple agents from a single interface, but those agents do not talk to each other.

Agent Teams bet on autonomous coordination. Codex bets on human-in-the-loop steering. The right choice depends on whether you trust the agents to make good decisions without you watching.

Third-Party Orchestrators: Claude Squad, Superset

Tools like Claude Squad and Superset add orchestration on top of existing coding agents. They use tmux and Git worktrees to isolate parallel sessions, and they work across models (Claude, Codex, Aider, OpenCode). But they lack native inter-agent messaging and shared task lists. Agents run in parallel without knowing about each other.

Agent Teams’ advantage is native coordination. The disadvantage: they only work with Claude, and they are still experimental.

When Agent Teams Are Worth the Token Cost

Agent Teams use significantly more tokens than a single session. Each teammate is a separate Claude instance with its own context window. For a team of four working for an hour, you can easily spend 10x what a single session would cost.

The overhead is worth it in four scenarios:

Parallel code review with distinct lenses. Spawn three reviewers: one for security, one for performance, one for test coverage. They do not overlap because each applies a different filter. The lead synthesizes findings.

Debugging with competing hypotheses. Five teammates investigating five theories about why the app crashes, actively trying to disprove each other. Sequential investigation suffers from anchoring bias. Parallel adversarial investigation finds the actual root cause faster.

New module development. Each teammate owns a different file or component. No coordination needed because they are working on independent pieces. The lead assembles the final result.

Cross-layer refactoring. Frontend, backend, database schema, and tests all need to change together. Each teammate owns a layer. They communicate about interface contracts.

The overhead is not worth it for sequential tasks, same-file edits, or anything where agents would constantly step on each other. If you are editing a single function, a single session with subagents is faster and cheaper.

Setting Up Your First Agent Team

Enable Agent Teams by adding this to your settings.json or environment:

{
  "env": {
    "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
  }
}

Then describe the task and team structure in natural language:

Create an agent team to refactor the authentication module.
Spawn three teammates:
- One for the token service (src/auth/tokens/)
- One for session management (src/auth/sessions/)
- One for integration tests (tests/auth/)
Require plan approval before any changes.

The lead creates the team, spawns teammates, and starts coordinating. Each teammate reads your project’s CLAUDE.md for context. The require plan approval instruction keeps teammates in read-only plan mode until the lead approves their approach.

Two practical tips from early adopters. First, give teammates enough context in the spawn prompt. They do not inherit the lead’s conversation history, so include file paths, constraints, and goals explicitly. Second, set explicit quality gates using hooks: the TeammateIdle hook runs when a teammate finishes, and TaskCompleted runs when a task is marked done. Both can reject work and send feedback, keeping teammates working until quality standards are met.

Limitations You Should Know About

Agent Teams are experimental and carry real constraints. /resume and /rewind do not restore in-process teammates. You can only run one team per session. Teammates cannot spawn their own teams (no nesting). Split-pane mode requires tmux or iTerm2 and does not work in VS Code’s integrated terminal or Windows Terminal.

Task status can lag: teammates sometimes forget to mark tasks complete, which blocks dependent work. The workaround is telling the lead to nudge stalled teammates. And cleanup is the lead’s responsibility. Teammates should not run cleanup because their team context may not resolve correctly.

These are real limitations that will matter on day one. They also explain why Anthropic labels this a “research preview” rather than a production feature.

Frequently Asked Questions

What are Claude Opus 4.6 Agent Teams?

Agent Teams are an experimental feature in Claude Code that lets you coordinate multiple Claude instances working in parallel on a shared codebase. A lead session spawns teammates, assigns tasks, and synthesizes results, while teammates work independently with their own context windows and communicate directly with each other.

How do Agent Teams differ from Claude Code subagents?

Subagents run inside a single session and can only report results back to the main agent. Agent Teams teammates run as independent Claude instances that message each other directly, share a task list, and self-coordinate. Agent Teams cost more tokens but enable true multi-agent collaboration, while subagents are better for quick, focused tasks.

How much do Agent Teams cost compared to a single Claude Code session?

Agent Teams use significantly more tokens because each teammate is a separate Claude instance with its own context window. For Anthropic’s C compiler stress test, 16 agents consumed 2 billion input tokens and 140 million output tokens over two weeks, costing under $20,000. For typical development tasks with 3-4 teammates, expect roughly 5-10x the token cost of a single session.

What are the best use cases for Claude Agent Teams?

Agent Teams work best for parallel code review with distinct criteria (security, performance, tests), debugging with competing hypotheses, developing independent modules simultaneously, and cross-layer refactoring that spans frontend, backend, and tests. They are not cost-effective for sequential tasks, same-file edits, or simple operations.

How do I enable Claude Code Agent Teams?

Set the CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS environment variable to 1 in your shell or add it to your Claude Code settings.json. Then describe the task and team structure in natural language. Agent Teams are currently a research preview and disabled by default.

How Agent Teams Actually Work#

The Lead-Teammate Model#

Communication Without Bottlenecks#

Display Modes and Direct Interaction#

The C Compiler: What 16 Agents Actually Did#

The Numbers#

Coordination Lessons#

Agent Teams vs. Subagents vs. Everything Else#

Subagents: Same Session, Less Overhead#

GPT-5.3-Codex: Interactive Steering#

Third-Party Orchestrators: Claude Squad, Superset#

When Agent Teams Are Worth the Token Cost#

Setting Up Your First Agent Team#

Limitations You Should Know About#

Frequently Asked Questions#

What are Claude Opus 4.6 Agent Teams?#

How do Agent Teams differ from Claude Code subagents?#

How much do Agent Teams cost compared to a single Claude Code session?#

What are the best use cases for Claude Agent Teams?#

How do I enable Claude Code Agent Teams?#