When the Math Flips: Two Schools of Loop Engineering, and the One Thing They Agree On

I recently wrote about loop engineering — the idea that the highest-leverage AI work has moved up a level, from prompting an agent to building the systems that prompt it. That piece leaned on Cobus Greyling's open-source loop-engineering project, which is fairly maximalist about it. The framing that anchors it comes from Boris Cherny, who runs Claude Code at Anthropic: "My job is to write loops."¹

Then I read a second take — CodeRabbit's essay on loop engineering by Hendrik Krack — and it stopped me, because it agrees with almost every mechanical detail of the first and then quietly refuses to agree with its conclusion. The two pieces describe the same machine and draw opposite lessons about when to switch it on.

That disagreement is the interesting part. So this post isn't a third summary of loop engineering — it's a map of where these two accounts converge and where they collide, and what I think the collision teaches.

Where they agree: the machine is settled

Start with the common ground, because it's substantial.

Both frame loop engineering as the next rung on a ladder. Krack lays out the lineage explicitly: prompt engineering (2023) → context management → harness engineering (late 2025) → loop engineering (2026), each step trading human intervention for autonomy. His one-line distinction is the cleanest I've seen: "Prompts serve as isolated instructions... Loops are recursive goals."² A prompt gets one answer and stops. A loop keeps going until the goal is met.

More strikingly, both accounts reach for the same architecture — the six building blocks, which both ultimately trace to Addy Osmani:

Automations — scheduled discovery and triage of work
Worktrees — isolated sandboxes so parallel agents don't collide
Skills — documented conventions so the agent stops re-deriving your project
Plugins / connectors — MCP integrations into trackers, databases, Slack
Sub-agents — a separate agent that checks the work
State — durable memory (often just a markdown log) that survives context resets

If you only read for the parts list, the two articles are nearly interchangeable. The machine is settled. The argument is about what to do with it.

Collision #1: "write loops" vs. "know which loops to write"

The GitHub-project framing treats loops as the new default posture. The tone is this is the job now — you should be designing loops the way you used to write functions.

Krack won't sign that check. The sharpest moment in the CodeRabbit piece is a caution against universal adoption:

"Loops reward a stable target... But when the conditions keep shifting, the math flips."

His claim: a loop is an investment with a payback period.² It pays off when the target is stable and success is measurable, because you amortize the build cost over many autonomous runs. But when the success criteria keep moving, you spend all your time maintaining the loop's logic — and plain manual prompting becomes the cheaper option. Loops aren't a virtue; they're a bet on stability.

So the two camps don't actually disagree about loops. They disagree about the mandate. One says your job is to write loops. The other says your job is to know which loops are worth writing — and to notice when the math has flipped against you.

I find Krack's version more honest, and not just more cautious. "Write loops" is a great rallying cry and a terrible default. Most real engineering targets are only locally stable; the skill isn't automation, it's judging the stability of the target before you automate against it.

Collision #2: who is allowed to say the work is good

Both accounts agree verification matters. They disagree about how much you're allowed to trust.

The maximalist framing keeps a verifier sub-agent in the loop and reminds you that final correctness is still a human's job — the checker catches the obvious stuff. Fine. But the verifier is still the AI grading the AI.

Krack draws a harder line. In his own loop, nothing merges without passing tests and a clean review from an independent tool (CodeRabbit's, in his case — more on that bias in a second). The reason he gives is the whole ballgame:

"a signal I could trust, not Claude's opinion of its own work."

That's a real philosophical split, not a tooling preference. One view: a sub-agent checker is good enough to keep the loop honest. The other: self-assessment is structurally untrustworthy, and the only thing that makes autonomy safe is a gate the generating model doesn't control — objective tests, or a reviewer with no stake in the answer. Once a loop can approve its own work, "it passed" and "it convinced itself it passed" become indistinguishable.

The obvious caveat: Krack writes for CodeRabbit, a code-review company, so "code review is the linchpin" is exactly what you'd expect him to conclude. Worth discounting for. But the underlying point survives the bias — independence of the gate matters more than which vendor supplies it. You could satisfy his principle with a test suite and a colleague's eyes and no CodeRabbit at all.

Collision #3: cost as a warning label vs. cost as the decision

The two even treat money differently. The project framing lists runaway token spend as a risk to monitor — watch your bill, sub-agents get expensive, throttle the cadence. Cost is a hazard you manage after you've decided to build.

For Krack, cost is upstream of the decision itself. The build-vs-prompt math is the choice. Sometimes the correct output of loop engineering is a loop you deliberately don't build. That reframes cost from a warning label into the actual selection criterion — and it's the same idea as collision #1 wearing a different hat.

The synthesis I'd actually operate by

Put the two together and a usable position falls out — sharper than either alone:

The architecture is a solved problem. Six components, well understood. Don't reinvent it; both sources hand you the same parts list.
The loop is a bet on target stability, not a default. Before automating, ask how often the definition of "done" moves. Stable target, measurable success → build the loop. Shifting target → keep prompting by hand and don't feel bad about it.
Autonomy is only as safe as its most independent gate. The one non-negotiable is a verifier the generating model can't talk its way past — real tests, an outside reviewer, a human at the risky steps. A loop that grades its own homework isn't automated; it's unsupervised.

There's a quieter tension underneath all this that neither piece fully resolves. The maximalist account worries about comprehension debt — ship enough unreviewed loop output and you stop understanding your own codebase. Krack's answer is essentially to substitute an objective gate for personal understanding: trust the signal, not your read of the diff. Does a strong external gate actually solve comprehension debt, or does it just make it comfortable to accumulate? I don't think either author knows yet. I don't either. But it's the right question, and it's the one that only shows up when you read the two accounts against each other.

The takeaway

"You should be writing loops" and "you should know which loops not to write" sound like opposites. They're really two halves of the same skill. The first camp is right that the machinery is here and you should learn it. The second is right that the machinery is not the point — judgment about when to deploy it, behind a gate you can actually trust, is. Learn the machine from the maximalists. Learn the restraint from the skeptics. The engineers who get both will quietly out-ship everyone still arguing about which one is correct.

References

Cobus Greyling, loop-engineering (MIT) — https://github.com/cobusgreyling/loop-engineering. Verified against the repository README: it presents loop engineering as the successor to prompt engineering, defines the six building blocks, and reproduces the Boris Cherny quote cited here. ↩
Hendrik Krack, "Loop Engineering," CodeRabbit — https://www.coderabbit.ai/blog/loop-engineering. Verified against the essay: it lays out the prompt-engineering → loop-engineering lineage, states the "recursive goals" distinction, cautions that loops reward a stable target and that "the math flips" when conditions shift, insists on an independent trust signal over the model's "opinion of its own work," and credits the six-component framework to Addy Osmani. Note: CodeRabbit sells AI code review, so its emphasis on review as the linchpin is vendor-aligned. ↩ ↩²