Loop Engineering: Stop Prompting Your Coding Agents, Start Designing the Loops That Prompt Them

For the last two years the skill everyone chased was prompt engineering: coax a better answer out of the model by phrasing the request just right. That skill is quietly becoming obsolete — not because prompts stopped mattering, but because the highest-leverage work moved up a level. The question is no longer "what do I ask the agent?" It's "what system decides what to ask the agent, runs it, and checks the result — without me in the chair?"

That system has a name now. Cobus Greyling's loop-engineering project lays it out as a discipline, and the framing comes straight from the people building these tools. As Peter Steinberger puts it: "You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents." Boris Cherny, who heads Claude Code at Anthropic, says it even more bluntly: "I don't prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops."¹

This post walks through what a loop actually is, the building blocks it's made of, the patterns people run in production, and how to roll one out without lighting your token budget on fire.

What a loop actually is

A loop is a standing system that repeatedly discovers work, does it, and verifies it — on a cadence, without a human typing each prompt. Strip it down and a single cycle looks like this:

A schedule or automation fires on a cadence (every 15 minutes, every 6 hours, on every tag).
A triage step decides whether there's anything worth doing right now.
State is read from durable memory so the loop knows what it already handled.
An isolated worktree is created so the work happens in a safe sandbox, not on your main branch.
An implementer agent does the actual work.
A verifier agent runs the tests and gates the result.
Connectors (Git, your ticket system, CI) tie it into the real world.
A human gate catches anything risky or ambiguous.
The result commits, opens a PR, or escalates — and state is written back for the next cycle.

The important shift is in who holds the initiative. In a chat, you are the scheduler, the triager, and the verifier. In a loop, those roles get externalized into the system so it can run while you sleep.

The building blocks

Loops are assembled from six reusable parts. If you use Claude Code, Cursor, Codex, or similar tools, most of these will already be familiar — loop engineering is mostly about wiring them together deliberately.

Automations / scheduling — the heartbeat. Discovers and triages tasks on a cadence.
Worktrees — isolated git working directories that let agents run in parallel without stepping on each other or your main branch.
Skills — persistent, reusable project knowledge the agent can load on demand instead of re-explaining your conventions every run.
Plugins & connectors — integrations with external tools, typically over MCP (Model Context Protocol), so the loop can touch Git, tickets, CI, and more.
Sub-agents — the maker/checker split. One agent implements, a separate one verifies. Separation of duties, applied to AI.
Memory / state — a durable spine that lives outside the conversation so the loop remembers what it did across runs and compactions.

That last one is the quiet hero. A chat forgets; a loop must not. State is what turns a one-shot agent run into something that can run every day for a month and not repeat itself.

Seven patterns people actually run

The repo catalogs production-tested patterns.¹ They're worth knowing by name because they double as a menu — most teams start by cloning one:

Pattern	Cadence	Cost	What it does
Daily Triage	1d–2h	Low	Reports on new issues and PRs each morning
PR Babysitter	5–15m	High	Continuously monitors and nudges open pull requests
CI Sweeper	5–15m	Very High	Detects failed builds and attempts remediation
Dependency Sweeper	6h–1d	Medium	Automates dependency patching
Changelog Drafter	1d or on tag	Low	Generates release notes
Post-Merge Cleanup	1d–6h	Low	Runs off-peak maintenance tasks
Issue Triage	2h–1d	Low	Categorizes and routes incoming issues

Notice the cost column. The cheap loops (triage, changelogs, cleanup) run on a slow cadence and mostly read. The expensive ones (PR babysitter, CI sweeper) run every few minutes and spin up sub-agents that write and re-run tests. Cadence times token-per-run is your bill — and it compounds fast.

Rolling one out without getting burned

The single best idea in the project is the three-level rollout. You don't hand an autonomous agent your main branch on day one. You earn each level of trust:

L1 — Report. Read-only. The loop tells you what it would do; a human does everything. This is where you validate that the triage is even correct.
L2 — Assisted fixes. The loop proposes concrete changes but gates them behind human approval. You're reviewing diffs, not writing them.
L3 — Unattended. Fully autonomous, but only on an allowlist of actions you've explicitly deemed safe. Everything else still escalates.

The tooling supports this progression directly. loop-init scaffolds a project with budget and state files; loop-cost estimates your token spend for a given pattern and level before you run it; loop-audit scores how complete and production-ready your setup is (a "Loop Ready" score out of 100); and loop-sync detects drift between your declared state and your actual loop.

The honest risks

Loop engineering is powerful precisely because it removes you from the moment-to-moment loop — which is also exactly what makes it dangerous. The project is refreshingly direct about the failure modes:

Token costs escalate non-linearly. A sub-agent that spawns sub-agents on a 5-minute cadence can quietly run up a serious bill. Estimate before you deploy.
Verification is still your job. The verifier agent gates the obvious failures. Subtle correctness is still on a human — the loop just makes it easy to forget that.
Comprehension debt. When loops ship changes nobody reviewed, your understanding of your own codebase erodes. You end up maintaining code you never read.
Loops aren't reproducible across people. Two engineers running the identical loop get different results, because the loop's behavior depends on the context and state each person feeds it.

None of these are reasons not to build loops. They're reasons to start at L1, keep a human at the gate, and graduate deliberately.

The bigger shift

The through-line here mirrors something I've written about before: the leverage is in what you inject into the system, not in the model's cleverness. A good prompt makes one answer better. A good loop makes a thousand answers happen while you're doing something else — and puts the guardrails, the state, and the verification where they belong, in the system rather than in your head.

Prompt engineering was about talking to the model. Loop engineering is about building the thing that talks to the model for you. If you're spending your days re-typing variations of the same request to a coding agent, that's the tell: there's a loop waiting to be written.

The framework, tooling, patterns, and starter kits described here come from Cobus Greyling's open-source loop-engineering project. It's worth a read for the clone-and-run starter kits and the failure-modes catalog alone.

References

Cobus Greyling, loop-engineering (MIT) — https://github.com/cobusgreyling/loop-engineering. Verified against the repository README: it defines the six building blocks, the seven production patterns with their cadence/cost, the L1–L3 rollout levels, the loop-init/loop-audit/loop-cost/loop-sync tooling, and reproduces the Peter Steinberger and Boris Cherny quotes cited above. ↩ ↩²