I haven't written a line of production code myself in weeks.
The agents do that now. They pick up GitHub issues, create branches, write code, run tests, open PRs. I verify their test plans, architecture, and merge them. When I disagree, I go back and chat with the agent to align it better with the objectives.
This isn't a flex about productivity. It's an observation about what happened to my job. I used to be a software engineer. Now I'm something else. I set intent, verify output, and improve the system that does the actual work. The work itself moved to machines.
And this isn't unique to coding. Every form of knowledge work is going through the same transformation right now. Most people just haven't noticed yet.
What Knowledge Work Actually Is
Strip away the job titles and industry jargon, and all knowledge work follows the same loop:
- Understand what needs to be done
- Research how to do it
- Execute the work
- Verify the output is correct
- Iterate until it's good enough
Writing a legal brief. Designing a marketing campaign. Debugging a production outage. Analyzing a dataset. Building a financial model. The domains change, the loop doesn't.
For the past few decades, the valuable part of knowledge work was assumed to be steps 2 and 3. Research and execution. That's where the expertise lived. Lawyers bill for research hours. Engineers are valued for implementation skill. Analysts are hired for their ability to crunch numbers and build models.
Steps 1 and 4, understanding the problem and verifying the solution, were considered the easy parts. Just setup and cleanup around the "real" work.
That's inverting now. Completely.
The Inversion
AI agents can now handle steps 2 and 3 at a level that ranges from "good enough" to "better than most humans" across a surprisingly wide range of knowledge work. They research. They execute. They do it fast and they do it at scale.
What they can't do well is steps 1 and 4. They can't judge what's worth doing. They can't tell you whether the output actually solves the right problem. They don't know what "good" looks like in context.
The Inversion
The parts of knowledge work that used to be "just overhead" (deciding what to do, verifying it was done right) are now the entire job. The parts that used to be "the real work" (research and execution) are increasingly handled by agents.
This isn't theoretical. I live it every day.
When I sit down in the morning, I don't think "what code should I write?" I think "what should the system build today, and how will I know if it did it right?" I write intent, not code. I define what success looks like, not how to get there. Then I point agents at it and spend my time reviewing what comes back.
The same pattern works for every knowledge domain I've tried it on. And I don't mean hypothetically. Here's what the last three weeks of my life actually looked like.
Financial analysis. A friend had been using my credit card for over a year. Three different Chase cards, one of which was reported lost and reissued with a new number mid-cycle. When he asked how much he owed me, I told my agent to figure it out. It parsed 12 months of Chase PDF statements, identified that three card numbers were actually the same account (card replaced after a lost/stolen report), mapped every transaction to the right person, then cross-referenced against Venmo payment history, wire transfers, bank deposits, and checks across two currencies, converting at the actual exchange rate on each transaction date. Twenty minutes later, there was a URL. Hundreds of transactions parsed, tens of thousands in charges identified, repayments cross-referenced, final balance calculated to the cent. Every line item traceable to a source document. I verified the numbers, sent the link, done. No spreadsheet, no argument, no back-and-forth.
Same pattern for everything else that month. Figuring out my actual net worth across accounts in two countries. Building an investment plan. Tax strategy across US and Indian jurisdictions. Mattress research for a new apartment. Car comparison for my family. Each one: the agent did the research and built a sourced decision page at a URL. I verified and decided. The pattern is identical every time. Only the domain changes.
Writing and content. I've published 4 blog posts in two weeks, each with inline SVG diagrams. The agent built all of it — the writing, the visuals, the formatting. My job was knowing what was worth explaining and whether the explanation was clear.
The most interesting example: a shoutout post thanking my team. This went through 15 iterations. The first drafts were corporate LinkedIn slop — bullet-point lists of names with descriptions. I kept pushing: "Write it as one connected piece, not sections glued together." "Facts over feelings." "No pray emoji." Each round of feedback became a permanent rule in a writing voice guide, so the agent never makes the same mistake twice. The final post weaves 8 names into a flowing narrative. It reads like one person talking, because the judgment about what sounds human came from me, and the execution came from the agent.
Each round of feedback became a permanent rule. The agent never makes the same mistake twice.
Trading. I built an AI trading system in a single session. The agent designed the strategy, I set the risk parameters — max position sizes, daily loss limits, drawdown circuit breakers, no margin. Six cron jobs run autonomously on weekdays. My role: I chose which stocks to trade, I verified each signal made sense, and I'm building evals to track which setups work. The agent does the scanning, analysis, and execution. I do the judgment about what risks to take.
In every case, the pattern is the same. The human provides judgment and verification. The agent provides research and execution. And the examples above aren't cherry-picked highlights from a demo. This is literally what every week looks like now.
Why "Just Ask an Agent" Is the Wrong Framing
Here's where most people's intuition breaks down. They hear "use AI for knowledge work" and they think: open ChatGPT, type a prompt, get an answer. That works for simple questions. It doesn't work for anything complex.
The reason is that complex knowledge work isn't a single prompt-response cycle. It's a system of interconnected tasks, each requiring context from the others, each producing artifacts that feed into the next step.
Writing a blog post isn't one task. It's: research the topic, find supporting examples, understand the audience, draft an outline, write sections, create diagrams, review for coherence, edit for voice, add links, verify facts. A single prompt can't hold all of that. The context window overflows. The quality degrades. The output becomes generic.
The solution isn't a better prompt. It's a better architecture.
When a task is too complex for a single agent, you don't need a smarter agent. You need a parent agent that breaks the task into subtasks and manages child agents that each handle one piece. The parent holds the big picture. The children hold focused context. The human holds judgment over the whole thing.
This is exactly how Agent Orchestrator works for coding. One orchestrator agent manages 16+ child coding agents. Each child works on one issue in isolation. The orchestrator routes CI failures, sequences merges, kills stuck agents. The human reviews PRs and sets priorities.
Agent Orchestrator itself was built this way. 16 parallel agents, 747 commits, 40,000 lines of TypeScript, 3,288 passing tests, 8 days from first commit to open source launch. The agents wrote the code. The orchestrator managed the agents. I reviewed PRs and set direction.
But the pattern isn't limited to code. It applies to any complex knowledge work.
The Three Things That Remain
If agents handle research and execution, what's left for humans? Three things:
1. Judgment: Deciding What to Achieve
Someone has to decide what's worth doing. Not "how to implement feature X" but "should we build feature X at all?" Not "write a blog post about Y" but "is Y worth writing about, and what angle matters?"
This is the part that requires understanding context that doesn't fit in a prompt. Company strategy. User pain points from conversations you had at dinner. The gut feeling that something is off about a metric everyone else is celebrating. The taste to know when a draft is good versus when it's just correct.
Agents are getting better at execution every month. They're not getting better at knowing what matters. That requires lived experience, values, and skin in the game.
2. Verification: Confirming It Was Done Right
"The code passes tests" isn't the same as "the code solves the right problem." "The report has no factual errors" isn't the same as "the report tells the right story." "The email is grammatically correct" isn't the same as "this is the right thing to say to this person right now."
Verification requires the same contextual judgment as deciding what to do. You need to know what good looks like. You need to catch subtle misalignments between what was requested and what was delivered. You need to notice when something technically satisfies the spec but misses the point.
This is where most people underestimate the difficulty. Reading a PR is fast. Reading a PR well is a skill. Knowing whether a marketing campaign will land with your audience requires understanding your audience. Knowing whether a financial model's assumptions are reasonable requires domain expertise.
Verification is not the easy part. It's the hard part that we used to do unconsciously because it was embedded in the execution.
3. Evals: Figuring Out How Well It Was Done
This is the meta-skill. Judgment tells you what to do. Verification tells you if it was done. Evals tell you how well the system is performing over time and where to improve it.
In software, this looks like: tracking merge success rates, measuring how often agent PRs need human fixes, identifying which types of issues agents handle well versus poorly. In marketing, it might be: which AI-drafted campaigns outperform human-drafted ones, what kinds of copy the agent consistently gets wrong, where human editing adds the most value.
Evals are what you figure out as you go. You can't design them upfront because you don't know what failure modes will emerge until the system is running. The agent writes code that passes tests but has subtle architectural problems. The agent drafts emails that are technically correct but tonally off. The agent creates marketing copy that hits all the brief requirements but somehow feels generic.
Each failure mode becomes an eval. Over time, your eval suite becomes the accumulated wisdom of working with agents. It's the system's immune memory.
The Recursive Pattern: Agents Managing Agents
Here's where it gets interesting. When the volume of work exceeds what one agent can handle, you don't hire more humans. You add more agents and put an agent in charge of them.
This sounds circular, but it's not. It's the same pattern at every level:
Level 0: Human does everything. You write the code, review the code, run the tests, fix the bugs, deploy.
Level 1: Human + agent. You tell the agent what to build. It writes code. You review and iterate.
Level 2: Human + orchestrator + agents. You tell the orchestrator what to build. It spawns agents, assigns tasks, routes failures. You review the final output.
Level 3: Human + orchestrators + agents + sub-agents. The orchestrator's child agents can themselves spawn sub-agents for complex subtasks. You set high-level intent and verify outcomes.
Each level pushes the human further from execution and closer to pure judgment.
This is already how Agent Orchestrator works. The system itself handles the scaffolding: issue assignment, worktree isolation, CI routing, merge sequencing, conflict detection, stuck agent recovery. The orchestrator agent doesn't do any of that. It's infrastructure, not intelligence. Everything that can be automated outside the agent is automated outside the agent, so the orchestrator can focus purely on judgment calls. Each child agent gets a GitHub issue, an isolated git worktree, and a CLI widget manager that optimizes its tooling. The child has full autonomy within its scope. The system handles the plumbing. The agent handles the thinking.
The human's role is level 2 or 3 depending on complexity. Set the intent, verify the output (review PRs), improve the system (tune prompts, add guardrails, fix recurring failure modes).
If It Fails, Teach It
The most common objection to using agents for knowledge work is "but it gets things wrong." Yes. It does. So do humans. The difference is what happens after the failure.
When a human employee makes a mistake, you explain what went wrong, they learn, and hopefully they don't repeat it. The same pattern works for agents, but the teaching mechanism is different.
With agents, you don't teach through conversation. You teach through structure:
Prompts become specifications. When an agent writes a bad PR description, you don't tell it "write better PR descriptions." You update the prompt template to include the format you want. The fix is permanent and applies to every future agent. When my growth engine drafted replies that were too promotional, I didn't just fix each reply. I wrote a voice guide: "No stats dumps. Lead with insight. Ask questions when they're describing the problem." That guide now governs every future draft across every session.
Guardrails become automated. When an agent introduces a breaking change, you don't just catch it in review and move on. You add a CI check that prevents that class of error. Now no agent (or human) can make that mistake again. When my trading agent needed risk limits, I didn't rely on it "being careful." I hard-coded circuit breakers: 3% daily loss limit, 10% drawdown kill switch, max 3 positions, mandatory stop losses. Structural, not behavioral.
Failure modes become evals. When you notice the agent consistently gets something wrong, you don't just fix each instance. You build a test for it. "Does the output have property X?" becomes a check that runs on every output. When my blog writing agent kept mixing unrelated facts in the same sentence (launch stats next to product descriptions), that became a permanent rule: "Every sentence in a paragraph must be about the same thing." A coherence check that runs on every draft, forever.
Over time, your system accumulates an immune system. Each failure makes every future agent better. This doesn't happen by magic. It happens because you, the human, exercise judgment about what went wrong and encode that judgment into the system.
What This Looks Like Across an Organization
I've been talking about individual knowledge work, but the pattern scales to teams and organizations. In fact, the organizational version is where this gets transformative.
Consider a typical tech company's knowledge work surface:
Engineering. Issues come in, code goes out. Agents handle implementation. Humans handle architecture decisions, code review, and system design. The orchestrator manages parallel work streams, CI, and merge ordering.
Customer support. Tickets come in, resolutions go out. Agents handle known issue resolution, documentation lookup, and first-response drafting. Humans handle escalations, edge cases, and empathy where it matters.
Growth and marketing. Opportunities surface, content goes out. Agents handle discovery, competitive research, content drafting, and distribution. Humans handle strategy, brand voice judgment, and relationship building.
Hiring. Job descriptions, candidate sourcing, initial screening. Agents handle pipeline mechanics. Humans handle cultural fit assessment, offer decisions, and selling candidates on the vision.
Every department has the same structure: agents handle the research-and-execution loop, humans handle judgment-and-verification. The departments don't need different AI strategies. They need the same pattern applied to their specific domain.
The enabling technology for this isn't a chatbot. It's an orchestration layer that can manage agent swarms across different domains while maintaining the judgment-and-verification loop with the right humans.
Case Study: Using Agents to Promote the Thing Agents Built
This is where the recursion gets real. Agent Orchestrator was built by 16 parallel coding agents in 8 days. 747 commits, 40,000 lines of TypeScript, 3,288 passing tests. The agents wrote it and the agents tested it. I set intent and reviewed PRs.
Then I needed to promote it. And I thought: why would I do growth manually when I just proved the thesis by building the product with agents?
So I built an automated growth engine. Here's what it does:
Every 45 minutes, a cron job fires. An agent searches X for conversations about multi-agent systems, parallel coding, orchestration, developer tooling. It pulls back raw candidates. A filtering pipeline scores them on relevance, recency, and reach. The top candidates get enriched with full thread context, so the agent understands the entire conversation before drafting a reply.
Then the agent drafts a reply for each high-value opportunity. Not a template. A genuine, thread-aware response that engages with what the person actually said, adds an insight from our experience building AO, and naturally mentions the repo where relevant.
My job? I open my phone, see a tweet link and a draft reply, and decide: post it or skip it. That's judgment. The agent did the research (finding the right conversations), the execution (drafting replies that sound human and add value), and I do the verification (is this reply actually good? does it belong in this conversation?).
Here's how the pipeline works:
The results: 50+ opportunities scored, 30 drafts generated, 5 posted. My time per reply: about 10 seconds. Read the draft, decide it's good, post it. Or skip.
Nobody could tell. Out of the replies we posted, a third got genuine engagement back — people asking follow-up questions about worktree isolation, debating architectural decisions, sharing their own setups. The conversations were the kind I would have wanted to have anyway. The agent just found them for me and drafted the opening.
The teaching loop was the most interesting part. Early drafts were too promotional — stats dumps, sales language. I gave feedback: "Lead with insight, not numbers." That feedback became permanent rules in a voice guide. Every future draft improved. The growth pipeline also extracted product insights as a side effect — common pain points from 50+ conversations became roadmap inputs.
This is the thesis demonstrated end to end. Agents built the product. Agents wrote about it. Agents promoted it. My job across all three phases was identical: judgment, verification, evals.
The Vision Doc as Alignment Mechanism
Here's a practical insight from running agent swarms: the quality of output is directly proportional to the quality of the input specification.
When I give an agent a one-line task description, I get generic output. When I give it a detailed vision doc that explains what we're trying to achieve, why it matters, what good looks like, and what constraints exist, the output is dramatically better.
This scales to organizations. If every person and every team has a well-written vision doc, agent swarms can execute against it with minimal drift. The vision doc becomes the alignment mechanism, the thing that keeps 50 parallel agents pointed in the same direction.
Think of it this way: the vision doc is the prompt at scale. A good prompt to a single agent produces good output. A good vision doc to an agent swarm produces coordinated, aligned output across many parallel workstreams.
Vision Docs as Agent Alignment
A vision doc isn't just for humans anymore. It's the specification that keeps agent swarms aligned. The clearer your vision doc, the less drift in automated execution. Every person and every org should have one, not as a planning exercise, but as an operational input to the agents that do the work.
This is why I think every organization should be writing thorough vision docs right now. Not as a corporate governance exercise. As infrastructure. The vision doc is to agent swarms what AGENTS.md is to a coding agent: the context that makes autonomous operation possible.
The Compounding Effect
Everything I've described is already happening with today's technology. Models will get better. Agents will get more capable. Orchestration frameworks will get more sophisticated.
The transformation compounds in three ways:
Models improve. Each generation of language models handles more complex tasks with fewer errors. Tasks that require human verification today will be agent-verified tomorrow. The human retreats further toward pure judgment.
Agents improve. Better tool use, longer context windows, more reliable execution. The failure modes that require human intervention today get automated away. The teaching loop (agent fails → human diagnoses → fix becomes structural) means the system accumulates capability over time.
Orchestration improves. Better coordination between agents, smarter task decomposition, more efficient resource allocation. The overhead of managing agent swarms decreases, making it practical to apply them to increasingly fine-grained tasks.
Each layer improving makes the other layers more effective. Better models mean agents fail less. Agents failing less means orchestrators spend less time on error recovery and more on optimization. Better orchestration means you can run more agents on more tasks, generating more learning signal to improve the whole system.
This is why I believe the transformation is inevitable and irreversible. It's not about any single capability leap. It's about a compounding loop where every improvement feeds into the next.
What to Do About It
If you're a knowledge worker reading this, the practical advice is simple:
Stop doing work agents can do. Every hour you spend on research, drafting, data analysis, or implementation is an hour you could have spent on judgment, verification, or system improvement. Start with one task. Delegate it to an agent. Review the output. Fix what's wrong. Repeat.
Start building your eval muscle. The skill of the future isn't execution. It's knowing what good looks like. Practice reading agent output critically. Build your sense for what's subtly wrong versus what's just different from how you'd do it. These are different things, and learning to distinguish them is the meta-skill.
Write your vision doc. Not a mission statement. A real specification of what you're trying to achieve, what constraints matter, what good looks like. Make it detailed enough that an agent swarm could execute against it. If you can't write it clearly, you don't understand it clearly enough, and that's valuable to discover.
Think in hierarchies. When a task is too complex for one agent, don't try to write a better prompt. Break it into subtasks and put a parent agent in charge. This is the fundamental design pattern of agent-native knowledge work. If you internalize it, you can apply it everywhere.
If you're an organization, the advice is even simpler: every department should be leveraging agent swarms. The technology is here. The patterns are proven. The teams that adopt this now will have a compounding advantage over those that wait. Not because agents are magic, but because the judgment-verification-eval loop improves with every cycle, and getting more cycles in earlier means you're further ahead.
This Is Already Normal
The strangest thing about living in this mode is how quickly it becomes normal. Three weeks ago, I was writing code. Now I review code that 16 agents wrote. Three weeks ago, I was manually parsing bank statements. Now I tell an agent what I need, it builds me a live dashboard, and I verify the output.
In February alone: 4 blog posts published, each with inline SVG diagrams. An investment plan with allocation charts and tax analysis. A mattress research page. A credit card reconciliation across 305 transactions. An automated trading system running 6 daily crons. A growth engine that discovered 50+ conversations and generated 30 draft replies. A product learnings document synthesized from community feedback. A team shoutout post refined through 15 iterations of voice calibration. An architecture doc for deploying AI agents across an engineering org.
All of this was built, researched, drafted, and operated by agents. None of it was written by me in the traditional sense. All of it was judged, verified, and directed by me.
The work didn't go away. It transformed. I still spend the same hours. But those hours are spent on judgment, not execution. On verification, not research. On improving the system, not feeding it inputs.
Knowledge work used to mean doing the work. Now it means knowing what work to do and whether it was done well. That's a fundamental change, and it's already here for anyone willing to work with agents instead of alongside them.
The people who are well-versed with agents already live this way. For everyone else, the gap is closing fast. Models get better every quarter. Orchestration frameworks get more accessible every month. The barrier to entry drops continuously.
Judgment, verification, and evals. That's what knowledge work looks like now. And that's all it's going to look like as models, agents, and orchestration frameworks continue to improve.
The nature of all knowledge work has fundamentally changed. The question isn't whether to adapt. It's how quickly you can.
I wrote this post the way I write everything now. I described what I wanted to say to my agent, told it to explore our chat history for real examples, and let it build the first draft. It pulled specific numbers from financial analyses it ran, referenced conversations it had with me about voice calibration, cited the growth pipeline it operates. I gave feedback, iterated, and verified the result. The thesis is mine. The examples are real. The execution was not. That's the whole point.
If you want to try orchestrating agent swarms yourself: Agent Orchestrator is open source.
We're hiring people who think this way at Composio.