February 9, 2026

Building MetaMorph

I built a tool that launches multiple Claude Code agents in Docker containers and lets them coordinate through git push conflicts. No orchestrator, no message broker - just git. Here's why.

aiclaudeagentsdevtoolsdistributed-systemsopen-source

The Problem

If you've spent any time using AI coding agents on real projects, you've probably hit the same wall I have. A single Claude Code session is remarkably capable - it can refactor a module, write a test suite, even scaffold an entire feature. But the moment you need to tackle something large e.g: migrating a codebase, building a compiler, rewriting a test harness across dozens of files - one agent isn't enough.

Context windows fill up. The agent loses track of what it already did. Mistakes compound. And you, the human, end up babysitting the session for hours, nudging it back on track. I wanted to point multiple agents at a problem, walk away, and come back to working software.

So I started looking at what was out there.

The Landscape

There's a surprising number of tools tackling "multi-agent orchestration for Claude Code" right now. Each takes a different approach, and the differences matter more than you'd think.

Orchestrator-Based Systems

The most intuitive approach is to have a boss agent that decomposes work and assigns tasks to workers. Gastown, built by Steve Yegge, takes this to the extreme - a "Mayor" agent orchestrates 20-30+ parallel "Polecat" agents, with additional monitoring agents watching the fleet. Multiclaude follows a similar pattern with a supervisor assigning work to subagents.

The appeal is obvious: one agent has the full picture and can make intelligent decisions about task decomposition. But orchestrators have real downsides. They're a single point of failure. They consume tokens just thinking about what other agents should do. And in practice, the orchestrator can become a bottleneck - workers sit idle waiting for their next assignment while the boss is still planning.

Terminal Multiplexers

Claude Squad takes the opposite approach - it's essentially a terminal UI for managing multiple AI coding sessions in parallel. Each agent gets its own tmux session and git worktree, and you switch between them. Simple, proven, and it supports tools beyond Claude (Aider, Codex, Gemini).

The trade-off is that there's no inter-agent communication. Each session is an island. You're the orchestrator, manually coordinating what each agent works on. It's great for parallel independent tasks, but it doesn't scale to the kind of deeply collaborative work I was interested in.

Anthropic's Agent Teams

Anthropic themselves shipped agent teams alongside Opus 4.6. A team lead creates teammates that work independently, each in its own context window, with a shared task list and even peer-to-peer messaging between agents.

It's the most tightly integrated option since it's built into Claude Code itself. But it's still experimental - no session resumption, one team per session, and the token costs add up fast with separate context windows per teammate.

Everything Else

There's Claude-Flow with RAG integration and "self-learning neural capabilities." ccswarm in Rust with channel-based message passing. Conductor Build with a dashboard. New ones seem to appear every week. The space is moving fast.

The Anthropic C Compiler Blog Post

Then I read Anthropic's engineering blog post about building a C compiler with 16 parallel Claude agents, and something clicked.

Their setup was almost comically simple. Sixteen Claude Opus 4.6 instances, each in a Docker container, all pushing to a shared bare git repository. Task coordination was done through lock files - an agent writes a file to current_tasks/, commits, and pushes. If the push gets rejected because another agent pushed first, it picks a different task. Shared state lived in a PROGRESS.md file that agents checked before each session.

That's it. No orchestrator. No message broker. No database. Just git.

Over two weeks and $20,000 in API costs, they produced a 100,000-line Rust compiler that passes 99% of the GCC torture test suite and can compile the Linux 6.9 kernel on x86, ARM, and RISC-V. Nearly 2,000 Claude Code sessions, 2 billion input tokens.

The key insight from their post is worth quoting directly: the hardest part isn't coordinating the agents - it's having good tests. "Claude will work autonomously to solve whatever problem I give it. So it's important that the task verifier is nearly perfect." If the tests are solid, agents can work independently and the test suite catches regressions. If the tests are flaky, everything falls apart regardless of how clever your orchestration is.

Why No Orchestrator?

The more I thought about it, the more the orchestrator-free approach made sense. Here's the reasoning:

Git is already a distributed coordination system. It handles concurrent writes, conflict detection, and history tracking. Push rejection is literally optimistic locking. Why build another coordination layer on top?

Claude Code is already good at deciding what to do. Every orchestrator I looked at essentially asks one Claude instance to plan work for other Claude instances. But Claude is perfectly capable of looking at a codebase, reading a progress file, checking what tasks are already claimed, and picking the next most useful thing to do. The orchestrator agent is spending tokens to do what each worker could do for itself.

Orchestrators don't fail gracefully. If your Mayor agent crashes in Gastown, the whole fleet is leaderless. If one of my agents crashes, the others keep working. The daemon restarts the failed container, and it picks up where it left off.

Simpler systems are easier to debug. When something goes wrong with a multi-agent system (and things will go wrong), I'd rather read git logs than trace messages through an orchestration layer.

Building MetaMorph

So I built MetaMorph. It's a single Go binary that does a few things well:

Launches agents in Docker containers. Each agent gets an isolated Ubuntu environment with Node.js and Claude Code installed. No filesystem conflicts between agents.

Creates a shared bare git repo. Your project gets cloned into .metamorph/upstream.git, which is mounted into every container. Agents clone from it, push to it, and pull from it. This is the only shared state.

Runs a daemon that monitors the fleet. Every 30 seconds, it checks if containers are still running (restarts them if not), cleans up stale lock files, watches for test failures in agent logs, and sends webhooks when interesting things happen.

Assigns roles. Agents can be developers, testers, refactorers, optimizers, reviewers, or documenters. Each role gets specific instructions injected into its prompt. You configure how many of each role you want in a TOML file.

Each agent runs in an infinite loop: pull the latest code, run a Claude Code session, auto-commit any leftover changes, push, sleep briefly, repeat. When a session ends (context window full, task complete, or error), the loop starts a fresh session with full context again.

The task claiming protocol is the same one from the Anthropic blog post. Write a lock file, commit, push. If the push is rejected, someone else got there first - reset, pull, try a different task.

$ metamorph init          # Create config file
$ metamorph start         # Launch daemon + agents
$ metamorph status        # Check what's happening
$ metamorph logs agent-1  # Watch an agent work
$ metamorph stop          # Shut it all down

What I Learned

A few things became clear after running this on real projects:

Tests really are everything. The Anthropic blog post wasn't exaggerating. When I pointed MetaMorph at a project with a solid test suite, agents made steady progress. When the tests were incomplete, agents would "fix" one thing by breaking three others, and the whole system churned without moving forward.

Agents are surprisingly good at self-organizing. I expected constant conflicts and duplicated work. In practice, agents read PROGRESS.md, check what's already claimed, and tend to gravitate toward different parts of the codebase naturally. There's occasional duplication, but git merge conflicts catch most of it.

Docker isolation prevents a whole class of problems. Early experiments without containerization had agents corrupting each other's working directories, installing conflicting dependencies, and generally making a mess. Docker containers solved this completely.

The daemon is the unsung hero. Agents crash. Claude hits rate limits. Sessions hang. The monitor loop catches all of this and recovers automatically. Without it, you'd be back to babysitting.

When To Use This (and When Not To)

MetaMorph is not for every project. It's designed for long-running tasks - hours to days - where the work can be parallelized across multiple agents. Building a large feature across many files. Migrating a codebase from one framework to another. Writing comprehensive test suites. Refactoring a legacy system.

It's not for quick one-off tasks. It's not cheap - you'll want a Claude Max subscription (or budget for significant API usage). And it requires Docker, which rules out some environments.

But if you've ever stared at a large codebase and thought "I know exactly what needs to happen here, I just need ten of me working in parallel" - that's the sweet spot.

Try It

MetaMorph is open source under Apache 2.0. Single Go binary, no dependencies beyond Docker:

github.com/robmorgan/metamorph

If you try it, I'd genuinely love to hear how it goes - what works, what breaks, and what you'd want it to do differently.