Anthropic just shipped agent teams as an experimental feature in Claude Code. The idea: instead of one agent working through tasks sequentially, you split work across multiple independent Claude Code instances that coordinate with each other.
I’ve just started using it. It’s useful for certain tasks, rough around the edges for others. This post covers what agent teams are, how to set them up, when they’re worth the token cost, and where they fall short.
What are agent teams?
Claude Code already had subagents - helper agents that run within your session, do some work, and report results back. Agent teams are different. Each teammate is a fully independent Claude Code instance with its own context window. They communicate directly with each other through a messaging system and coordinate through a shared task list.
The architecture has four parts:
- Team lead: your main Claude Code session. It creates the team, spawns teammates, and synthesises results
- Teammates: separate Claude instances that each work on assigned tasks
- Shared task list: a work queue with pending, in-progress, and completed states. Tasks can have dependencies
- Mailbox: direct messaging between any agents. Teammates can message each other, not just report back to the lead
This is the key difference from subagents. Subagents report to the main agent and never talk to each other. Agent teams let teammates share findings, challenge each other’s conclusions, and self-coordinate.
Setting it up
Agent teams are experimental and disabled by default. Enable them by adding this to your settings.json:
{
"env": {
"CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
}
}
Or export it in your shell:
export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
That’s it. No extra dependencies for the default in-process mode.
Starting a team
You don’t need special syntax. Just describe the task and the team structure you want:
Create an agent team to review the authentication system.
Spawn three teammates:
- Security reviewer: audit for vulnerabilities
- Performance analyst: profile response times
- Test coverage checker: verify edge cases
Have them share findings through the task list.
Claude creates the team, spawns the teammates, and they start working. Each teammate picks up tasks from the shared list and reports findings. The lead synthesises everything at the end.
Display modes
There are two ways to see what your team is doing:
In-process mode (default): all teammates run inside your main terminal. Use Shift+Down to cycle through them and type to message any teammate directly. Works in any terminal.
Split pane mode: each teammate gets its own terminal pane. You see everyone’s output simultaneously and click into any pane to interact. This requires tmux or iTerm2.
Configure it in settings.json:
{
"teammateMode": "tmux"
}
Or per-session:
claude --teammate-mode in-process
A note on split panes
I tried split pane mode and couldn’t get it working reliably. The tmux integration has known issues - the docs themselves note it “traditionally works best on macOS” and suggest using tmux -CC in iTerm2 as the entry point. It’s also not supported in VS Code’s integrated terminal, Windows Terminal, or Ghostty.
For now, in-process mode works fine. Shift+Down to cycle through teammates is adequate, and you can use Ctrl+T to toggle the task list view.
When agent teams are worth it
Agent teams burn through tokens. A 3-teammate team uses roughly 3-4x tokens compared to single-session work. Each teammate has its own full context window, and they’re all running simultaneously. If you’re on a metered plan, you’ll feel it. So you want to be deliberate about when you reach for them.
Strong use cases:
Parallel code review with different lenses. A single reviewer tends to fixate on one type of issue at a time. Three reviewers - security, performance, test coverage - each give thorough attention to their domain simultaneously:
Create an agent team to review PR #142. Spawn three reviewers:
- One focused on security implications
- One checking performance impact
- One validating test coverage
Debugging with competing hypotheses. This is where agent teams really shine. When the root cause is unclear, a single agent finds one plausible explanation and stops looking. Multiple agents investigating different theories - and explicitly trying to disprove each other - converge on the real answer faster:
Users report the app crashes on login. Spawn 3 teammates
to investigate different hypotheses. Have them talk to each
other to challenge each other's findings.
The debate structure fights anchoring bias. Once a single agent latches onto a theory, subsequent investigation is biased toward confirming it. Multiple independent investigators are harder to fool.
New features with independent modules. When each teammate can own a separate piece (frontend component, API endpoint, database migration) without stepping on each other’s files.
Cross-layer coordination. Changes spanning frontend, backend, and tests, each owned by a different teammate who understands their layer.
When to skip them
Sequential tasks. If step 2 depends on step 1, parallelism doesn’t help.
Same-file edits. Two teammates editing the same file leads to overwrites. There’s no merge resolution.
Simple tasks. Coordination overhead exceeds the benefit. A single session is faster for anything that doesn’t benefit from parallel exploration.
Tight dependencies. If teammates need to constantly wait on each other, you’re paying 3x tokens for sequential work with extra messaging overhead.
For these cases, subagents or a single session are more effective and cheaper.
Subagents vs agent teams
The decision comes down to one question: do the workers need to talk to each other?
| Subagents | Agent teams | |
|---|---|---|
| Context | Own window, results return to caller | Fully independent |
| Communication | Report back to main agent only | Message each other directly |
| Coordination | Main agent manages everything | Shared task list, self-coordination |
| Best for | Focused tasks where only the result matters | Complex work requiring discussion |
| Token cost | Lower | 3-4x higher |
Subagents are the right default. Reach for agent teams when you specifically need the inter-agent communication.
Practical tips
Give teammates enough context. They load CLAUDE.md and project config automatically, but they don’t inherit the lead’s conversation history. Include specifics in the spawn prompt:
Spawn a security reviewer with the prompt: "Review the auth
module at src/auth/ for vulnerabilities. Focus on token handling,
session management, and input validation. The app uses JWT tokens
stored in httpOnly cookies."
Start with 3-5 teammates. That balances parallel work with manageable coordination. Aim for 5-6 tasks per teammate. Scale up only when the work benefits from it.
Require plan approval for risky tasks. You can force teammates into plan mode before they implement anything:
Spawn an architect teammate to refactor the auth module.
Require plan approval before they make changes.
The teammate plans, sends it to the lead for approval, and only implements after getting the green light. The lead makes approval decisions autonomously - tell it your criteria up front (“only approve plans that include test coverage”).
Watch for the lead doing work itself. Sometimes the lead starts implementing instead of waiting for teammates. If you notice this:
Wait for your teammates to complete their tasks before proceeding
Clean up when done. Shut down teammates first, then have the lead clean up the team:
Ask all teammates to shut down, then clean up the team
Known limitations
This is experimental. Current rough edges:
- No session resumption for in-process teammates.
/resumeand/rewinddon’t restore them. After resuming, the lead may try to message teammates that no longer exist - Task status can lag. Teammates sometimes forget to mark tasks complete, blocking dependent tasks. You may need to nudge manually
- Spin-up and spin-down are slow. It takes a noticeable amount of time to get the team running, and shutting down is equally sluggish - teammates finish their current operation before stopping
- One team per session. Clean up before starting a new one
- No nested teams. Teammates can’t spawn their own teams
- Split panes require tmux or iTerm2. Not supported in VS Code terminal, Windows Terminal, or Ghostty
The bigger picture
Agent teams are part of a broader shift in how Claude Code handles parallelism. Subagents handle focused delegation within a session. Git worktrees let you run multiple Claude Code sessions manually. Agent teams add automated coordination on top.
The pattern emerging so far: subagents for quick, contained tasks (run tests, review a file, research a question). Agent teams for parallel work where the agents need to share findings. Single sessions for everything else.
The way the team lead orchestrates and assigns tasks is the most interesting part. It breaks work into pieces, delegates to the right teammate, and synthesises findings. You can see the bones of something useful here - it just needs iterating. The task dependency system, plan approval workflow, and direct messaging between agents are solid foundations.
The UX needs work though. Navigating between teammates in in-process mode is clunky - Shift+Down to cycle through them works but it’s not exactly intuitive. Split panes would solve this but they don’t work reliably yet. Session resumption is broken. You need to actively manage the team to prevent wasted effort.
Where I can see this getting really powerful is spec-driven development. Right now you describe a team and tasks manually, but the bigger win is having the team lead generate a full spec from a one-liner. You give it a sentence, it breaks it into a spec, spins up teammates, and they build it out. A lot of activity from a simple starting point.
Cursor’s team attempted something similar - they claimed to have built a web browser using AI agents running unattended over several days, writing 3M+ lines of Rust. The reality was less impressive: developers found heavy reliance on existing Servo components, code that didn’t compile, and a JS interpreter that was manually included rather than generated. It’s a cautionary tale about overselling multi-agent output, but the underlying idea - teams of agents building from a high-level prompt - is one direction the tooling is moving in. The difference will be in honesty about what the agents produce and how much human steering they need.
But for the right task - parallel debugging, multi-perspective review, independent module development - agent teams are already a real time saver despite the token cost.