Open-Sourcing Agent Orchestrator: Effectively Manage 30 Parallel Agents

A few days ago I wrote about running 20 AI agents in parallel from my home directory. That was ~2,500 lines of bash scripts, tmux sessions, and git worktrees held together with duct tape and ambition.

That system no longer exists. In 8 days, it turned into 40,000 lines of TypeScript, 17 plugins, 3,288 tests, and a proper open-source platform — and the agents built it themselves.

We're open-sourcing it today: Agent Orchestrator.

Agent Orchestrator dashboard — 35 sessions across Working, Pending, Review, and Merge lanes

Why build this

Most people get the AI coding agent problem wrong. The agents can code. That's not the bottleneck. You are.

You spawn five tasks, go grab coffee, come back 20 minutes later and now you're just refreshing GitHub tabs — waiting for PRs, checking CI, reading review comments. Congratulations, you've automated engineering and replaced it with project management. Bad project management.

Agent Orchestrator removes you from that loop. It tracks every session, watches CI, forwards review comments back to agents, and only pings you when something actually needs a human decision. Once that bottleneck — your attention — goes away, things start compounding fast.

A dashboard shows you status. An orchestrator agent looks at all your workstreams and tells you: "This PR is blocking three other tasks, this CI failure is a flaky test, and this review comment is the one that actually matters." That's the difference.

The orchestrator agent analyzing all PRs and recommending priorities

The other thing that matters: plug anything in. Different agent runtime? Different issue tracker? Different notification channel? Swap it. The orchestrator doesn't care if you use Claude Code or Aider, tmux or Docker, GitHub or Linear. Eight plugin slots, all replaceable.

What happened

The bash scripts worked fine for me. 20 agents, worktrees, CI failure forwarding. But they were fragile in the way bash scripts always are — adding a new notification channel meant editing five files, supporting Aider instead of Claude Code meant rewriting half the system. No plugin architecture, no tests, no way for anyone else to use it.

So I pointed 30 agents at the problem and let them rebuild it from scratch.

The Inception of Agent Orchestrator — from bash scripts to self-improving system

What they shipped:

Metric	Value
TypeScript lines of code	~40,000
Test cases (unit + integration)	3,288
Plugin packages	17
PRs merged	61 of 102 created
Commits (all branches)	722
PRs created by AI sessions	86 (84%)
Peak concurrent agents	30
AI co-authored commits	100%

Every commit has a Co-Authored-By git trailer identifying which AI model wrote it. No ambiguity about what humans did vs what agents did.

The timeline

People see "40K lines in 8 days" and assume I went into a cave. I have a day job. This was maybe ~3 days of actual focused work spread across 8 days, with agents filling the gaps.

Date	Day	Main Commits	Branch Commits	Lines Added	PRs Merged	What I Was Doing
Feb 13	Fri	23	157	+30,070	1	Evening → overnight
Feb 14	Sat	25	112	+5,599	27	Full day (weekend)
Feb 15	Sun	11	63	+4,779	5	Half day
Feb 16	Mon	11	68	+3,575	8	Around day job
Feb 17	Tue	1	94	+9,512	4	Evening only (blog day)
Feb 18	Wed	10	71	+2,921	11	Around day job + evening
Feb 19	Thu	9	91	+3,990	4	Around day job
Feb 20	Fri	1	0	—	1	Report day
Total		91	656	+76,454	61

Daily Activity — Commits and PRs Merged

Hourly patterns (IST)

Fri  Feb 13:  Started 8:41 PM → coded through midnight → 157 branch commits
Sat  Feb 14:  Midnight to 5 AM, break, 11 AM to midnight → 112 branch commits
Sun  Feb 15:  Midnight to noon, gap, evening burst → 63 branch commits
Mon  Feb 16:  Midnight burst, gaps during work hours, evening session → 68 commits
Tue  Feb 17:  Midnight burst, big daytime gap, evening architecture refactor → 94 commits
Wed  Feb 18:  Midnight to 8 AM push, daytime gap, afternoon review → 71 commits
Thu  Feb 19:  Agents running most of day, review in evening → 91 commits
Fri  Feb 20:  Merged yesterday's work → 0 branch commits

The pattern was pretty simple: set up sessions before bed, agents work overnight, review and merge in the morning before work, set up new sessions, repeat. Three intense focus periods — Friday night into Saturday, Tuesday evening, and Wednesday morning — account for most of it.

Saturday Feb 14

The standout day. 27 PRs merged. The entire platform shipped: core services, CLI, web dashboard, all 17 plugins, npm publishing. I was reviewing and merging PRs faster than I could read them, but every PR had passed CI and automated code review first. The agents were doing the quality work — I was just doing triage and architecture decisions.

Which models did what

Every commit tracks the model via git trailers:

Model	Co-Authored Commits
Claude Opus 4.6	512
Claude Sonnet 4.5	373
Claude Sonnet 4.6	124
Claude Opus 4.5	4
Total trailers	1,013

Totals exceed 722 commits because some commits were written by one model and reviewed/fixed by another. Opus 4.6 handled the hard stuff — complex architecture, cross-package integrations. Sonnet handled volume — plugin implementations, tests, docs.

The review loop

The agents didn't just write code and throw it over the wall. There was a real review cycle:

Agent creates PR and pushes code
Cursor Bugbot automatically reviews and posts inline comments
Agent reads comments, fixes the code, pushes again
Bugbot re-reviews

Reviewer	Reviews	Inline Comments	Share
Cursor Bugbot (automated)	377	700	69%
AI agents	316	303	30%
Humans	13	13	1%

Code Reviews — Who Reviewed What

700 automated code review comments. Bugbot caught real stuff — shell injection via exec(), path traversal, unclosed intervals, missing null checks. The agents fixed ~68% immediately, explained away ~7% as intentional, and deferred ~4% to future PRs.

The ao-58 story

PR #125, a dashboard redesign. It went through 12 CI failure→fix cycles. Each time, the agent got the failure output, diagnosed the issue (type errors, lint failures, test regressions), and pushed a fix. No human touched it.

12 rounds. Zero human intervention. Shipped clean.

Branch	CI Failures	CI Successes	What Happened
session/ao-58	12	28	Dashboard redesign — 12 rounds of CI fixes
session/ao-52	7	76	Self-improvement system
feat/EVENT-REACTIONS-ARCH	3	17	Reactions architecture
fix/spawn-status-transition	2	50	Status transitions

All 41 CI failures across 9 branches were eventually self-corrected by agents. Overall CI success rate: 84.6%.

CI Self-Correction — Failures vs Successes per Branch

Architecture

Plugin system with 8 slots:

Slot	Plugins	Purpose
Runtime	tmux, process	Where agents execute
Agent	claude-code, aider, codex, opencode	Which AI coding agent
Workspace	worktree, clone	How code is isolated
Tracker	github, linear	Where issues come from
SCM	github	PR creation and enrichment
Notifier	desktop, slack, composio, webhook	How humans get notified
Terminal	iterm2, web	How you observe agents
Lifecycle	(core)	Reactions and status transitions

Session lifecycle:

Agent Orchestrator — Session Lifecycle Architecture

Tracker pulls an issue (GitHub or Linear)
Workspace creates an isolated worktree or clone
Runtime starts a tmux session or process
Agent (Claude Code, Aider, etc.) receives the task and works autonomously
Terminal lets you observe live via iTerm2 or the web dashboard
SCM creates PRs and enriches them with context
Reactions automatically re-spawn agents on CI failures or review comments
Notifier pings you only when human judgment is needed

Don't use tmux? Use the process runtime. Don't use GitHub? Use Linear for tracking. Don't use Claude Code? Plug in Aider or Codex. Swap any piece.

Activity detection

One of the trickier problems: figuring out what an agent is actually doing without asking it.

Claude Code writes structured JSONL event files during every session. Instead of relying on agents to self-report (they lie, or at least get confused), the orchestrator reads these files directly:

Is the agent actively generating tokens?
Is it waiting for tool execution?
Is it idle?
Has it finished?

The agent-claude-code plugin knows how to parse Claude's session files. A future agent-aider plugin would read Aider's equivalent. Same approach as the bash scripts, but properly abstracted now.

Web dashboard

Next.js 15, Server-Sent Events for real-time updates. No polling.

Attention zones — sessions grouped by what needs your attention (failing CI, awaiting review, running fine)
Live terminal — xterm.js embedded in the browser, showing the agent's actual terminal output in real time
Session detail — current file being edited, recent commits, PR status, CI status
Config discovery — automatically finds your ao.config.yaml and shows available sessions

The dashboard itself went through a major redesign (the ao-58 story above), done entirely by an agent.

Session detail — PR status, CI checks, blockers, and live terminal showing an agent resolving merge conflicts

Getting started

# Clone
git clone https://github.com/ComposioHQ/agent-orchestrator.git
cd agent-orchestrator

# Setup
pnpm install
pnpm build

# Initialize a project
ao init --tracker github --agent claude-code --runtime tmux

# Start a session from a GitHub issue
ao start --issue 42

The ao init command generates an ao.config.yaml:

tracker:
  plugin: github
  owner: your-org
  repo: your-repo

agent:
  plugin: claude-code

runtime:
  plugin: tmux

workspace:
  plugin: worktree

scm:
  plugin: github

notifier:
  plugin: desktop

Reactions

The most useful feature. Automated responses to GitHub events:

reactions:
  ci_failed:
    action: spawn_agent
    prompt: "CI failed on this PR. Read the failure logs and fix the issues."

  changes_requested:
    action: spawn_agent
    prompt: "Review comments have been posted. Address each comment and push fixes."

  approved:
    action: notify
    channel: slack
    message: "PR approved and ready to merge."

CI fails? Agent picks it up. Reviewer requests changes? Agent reads the comments and fixes the code. PR approved? You get a Slack notification. This is how those 41 CI failures got self-corrected — the reactions system just forwarded failures back to agents automatically.

The meta part

I had 30 concurrent agents working on Agent Orchestrator. They were building the TypeScript replacement while I was using the bash-script version to manage them. The thing being built was the thing managing its own construction.

What I actually did:
- Architecture decisions (plugin slots, config schema, session lifecycle)
- Spawning sessions and assigning issues
- Reviewing PRs (mostly architecture, not line-by-line)
- Resolving cross-agent conflicts (two agents editing the same file)
- Judgment calls (reject this approach, try that one)

What agents did:
- All implementation (40K lines of TypeScript)
- All tests (3,288 test cases)
- All PR creation (86 of 102 PRs)
- All review comment fixes
- All CI failure resolution

I never committed directly to a feature branch. Every line of code went through a PR.

Why not just use Claude Code's native teams?

Claude Code now has built-in team coordination. Fair question.

Multi-agent, not multi-model — Native teams are Claude Code instances coordinating within one session. The orchestrator manages sessions across repos, across projects, across agent types.
Persistent sessions — Sessions survive terminal crashes, SSH disconnects, laptop reboots. Agents resume where they left off.
External integrations — GitHub reactions, Linear tickets, Slack notifications, webhook triggers. It lives in the CI/CD ecosystem, not just in a terminal.
Scale — 30 concurrent agents across 40 worktrees, with a dashboard showing what each one is doing.
Plugin architecture — Swap any component. Mix and match.

The self-improving loop

Every agent session generates signal. Which prompts led to clean PRs? Which ones spiraled into 12 CI failure cycles? Which patterns caused merge conflicts? Which review comments were real bugs vs style nitpicks?

Most agent setups throw this signal away. You run an agent, it finishes, you move on. Next session starts from zero.

Agent Orchestrator has a self-improvement system (ao-52 — itself built by an agent) that logs performance, tracks session outcomes, and runs retrospectives. It learns which tasks succeed on the first try and which need tighter guardrails. Which prompts produce better code. Which CI failures are flaky vs real.

Agents build features → orchestrator observes what worked → adjusts how it manages future sessions → agents build better features. The system doesn't just run agents. It gets better at running them over time.

And since the orchestrator was built by the agents it orchestrates, and those agents wrote the self-improvement system that now makes them more effective... yeah. It's recursive. The tool is improving itself through the agents it manages.

I think this is why orchestration matters more than any individual agent improvement. The ceiling isn't "how good is Claude Code at TypeScript." It's "how good can a system get at deploying, observing, and improving dozens of agents working in parallel." That ceiling is much higher. And it rises every time the loop runs.

What's next

The orchestrator works. The bottleneck has shifted to how tightly the human-agent loop can run. Three things I care about most:

Talk to your agents from anywhere. Right now you need to be at your desk. That's dumb. You should be able to message the orchestrator from Telegram or Slack — check status, approve a merge, redirect an agent — while you're on a walk. The human doesn't need to be in front of a terminal.

Tighter mid-session feedback. Agents drift. They start solving the wrong problem, over-engineer a simple fix, go down rabbit holes. The orchestrator needs to check agent work against the original intent and inject course corrections before they've burned 20 minutes going the wrong direction. Not just reacting to CI failures after the fact.

Automatic escalation. Agent can't solve something? Escalate to orchestrator. Orchestrator needs judgment? Escalate to you. Right now these handoffs are manual. They should be automatic — you only see things that genuinely need a human decision. Everything else resolves itself.

Beyond that: a reconciler for automatic conflict resolution between parallel agents, auto-rebase for long-running branches, Docker/K8s runtimes for cloud deployments, and a plugin marketplace for community contributions.

Try it

The repo is live: github.com/ComposioHQ/agent-orchestrator

Full metrics report: metrics-v1

If you're running parallel AI agents with bash scripts and hope — we were there a week ago. This is what comes next.

Use ao to build ao

You can use Agent Orchestrator to contribute to Agent Orchestrator. That's literally how it was built.

ao init --tracker github --agent claude-code --runtime tmux
ao start --issue 42

Pick an issue, point an agent at it, let it work. The repo has solid test coverage (3,288 cases), CI that catches real issues, and Bugbot reviewing every PR automatically. Your agent gets the same feedback loop ours did.

We're looking for:
- New plugins — agent runtimes, trackers, notifiers, terminal integrations
- Docker/K8s runtime — run agents in containers instead of tmux
- Reconciler — automatic conflict detection between parallel agents
- Better escalation — smarter rules for when agents should ask for help

I'm building Agent Orchestrator and the integrations layer at Composio, and we're hiring: jobs.ashbyhq.com/composio.

Right now, 30 Claude Code processes are running, building the next set of features. The orchestrator is orchestrating its own improvement. Turtles all the way down.