Friday, May 8, 2026
HomeSoftware DevelopmentAndrej Karpathy Has Renamed Vibe Coding. Right here’s What Engineering Leaders Have...

Andrej Karpathy Has Renamed Vibe Coding. Right here’s What Engineering Leaders Have to Do About It.

-


On the one-year anniversary of coining “vibe coding,” Andrej Karpathy proposed changing it with “agentic engineering.” The excellence he drew was exact: vibe coding is describing what you need and accepting what comes again. Agentic engineering is designing the system, specifying the constraints, and utilizing AI to speed up implementation you could have already reasoned via. One is expression. The opposite is engineering.

Most software program organizations are operating each concurrently and calling them the identical factor. That’s the place the costly errors are coming from.

One among my growth leads put it plainly — not as a coverage place, however as an empirical statement. In his expertise, vibe-coded PRs persistently arrive lacking edge case dealing with, error paths, and exception logic. Not as a result of the AI forgot them.; it’s as a result of the developer by no means specified them. They described an consequence, accepted what the agent produced as a result of it seemed proper, and submitted it. The exams move as a result of they have been written towards the code that exists, not towards the conduct the system truly requires.

The agent didn’t make one thing up. The developer didn’t know what to ask for.

His response is to not reject AI coding instruments. It’s to require that engineers exhibit they perceive what was generated — the sting circumstances, the scaling assumptions, the failure modes — earlier than the PR will get merged. For those who can not clarify why the answer is designed the best way it’s, you didn’t design it. You accepted it.

He’s proper. And the information backs him up. PR overview instances on closely AI-assisted groups are up 91% — not as a result of AI is writing worse code, however as a result of reviewers are actually liable for reconstructing the comprehension that the developer skipped. That could be a tougher overview, not a better one. And it’s compounding.

 What AI Did to the Roles — and What It Didn’t

There’s a widespread assumption amongst know-how leaders that AI coding instruments collapsed the excellence between who builds and who critiques — that the agent writes properly sufficient that the previous high quality gates are a legacy of a slower period.

That assumption confuses velocity with comprehension.

The developer, the tester, the architect — these roles have been by no means primarily about producing artifacts. They have been about understanding the system properly sufficient to know when one thing was flawed earlier than it turned another person’s drawback. The developer who spots a race situation noticed it as a result of they understood the execution mannequin. The tester who asks “what occurs when the consumer does the sudden factor?” requested it as a result of they reasoned via the system’s conduct. The architect who acknowledges that this answer works now and can break at scale acknowledged it as a result of they held the entire system of their head.

These aren’t manufacturing duties. They’re comprehension duties. You can’t delegate comprehension to an agent.

What modified is which you can now produce 100 traces of code with out having finished the considering {that a} hundred traces of code used to require. The output exists. The understanding behind it could not. An engineer reviewing a vibe-coded PR shouldn’t be reviewing code — they’re attempting to reconstruct whether or not the developer who submitted it truly understood what they have been constructing.

The roles aren’t dissolving. They’re being stress-tested. The developer who designed the answer — who can clarify each edge case, each failure mode, each scaling assumption — is extra beneficial than earlier than. The one who accepted what the agent produced as a result of it seemed proper and the exams handed is now a legal responsibility on the velocity the group is transferring.

 Three Failure Modes Engineering Managers Have to Watch For

These aren’t hypotheticals. They’re patterns repeating throughout organizations deploying AI coding instruments at scale.

The inexperienced pipeline drawback.  A inexperienced pipeline means the code does what it was requested to do. It doesn’t imply the developer requested the precise factor, or requested fully sufficient. A senior engineer is aware of to look behind the inexperienced. A supervisor who has stepped too removed from the work can not inform from a dashboard whether or not inexperienced means protected or means quick and unexamined.

The lacking path drawback. The developer who doesn’t perceive the system’s failure modes can not specify them. The agent can not floor what the developer didn’t know to require. In a manufacturing system, the comfortable path is the place issues work. The sad paths are the place you discover out what the system is definitely product of. AI brokers, as Karpathy famous, have been purpose-built for the primary 80% of an utility — the implementation that flows naturally from a well-described intent. The final 20% — the sting circumstances, the failure restoration, the scaling constraints — requires a developer who has truly thought via the system. That 20% is the place vibe-coded code persistently runs out.

The boldness calibration drawback. AI-generated code reads as authoritative. The construction is clear, the naming is coherent, the feedback are current. It doesn’t appear like code written by somebody who was unsure — even when the underlying logic incorporates a wager that one thing won’t ever occur. Human code carries the fingerprints of doubt: the remark that claims “TODO: deal with this case,” the defensive examine that alerts the developer was unsure. AI code usually lacks these alerts. Reviewers have to provide the doubt themselves. That requires judgment the reviewer can solely train in the event that they perceive the system properly sufficient to know what to doubt.

 What Engineering Leaders Have to Do In another way

There’s a model of technical management that sounds subtle and is quietly harmful on this setting: the supervisor who has stepped again from the code to give attention to supply metrics, who measures the AI program by velocity numbers and adoption charges, and who interprets a senior engineer’s insistence on deep code overview as resistance to alter.

That supervisor is optimizing for the output of the method relatively than the standard of the judgment being utilized to it. In a fast-moving AI setting, that may be a compounding error.

Technical proximity shouldn’t be micromanagement. It’s not writing code or reviewing each PR. It’s being shut sufficient to the precise conduct of the techniques you might be accountable for which you can inform the distinction between a group transferring quick as a result of they’re disciplined and a group transferring quick as a result of they skipped the arduous half.

The supervisor who can not learn a PR doesn’t must overview each one. However they should perceive what their senior engineers search for once they do. That distinction — between “this handed the exams” and “that is proper” — shouldn’t be out there from a abstract. It’s out there from contact.

My group runs three rituals that don’t have anything to do with standing updates and all the pieces to do with sustaining that contact.

Two hours each week in an structure working session. Two hours each different week in dash planning. Two hours every dash demoing to the entire group.

The structure classes are the place the system’s reasoning lives — not the tickets, not the documentation, however the dwelling dialog about why issues are designed the best way they’re and what the choices have been that weren’t taken. A supervisor who sits in these classes for six months builds a working mannequin of the system that no dashboard can replicate.

Dash planning is the place the disconnects floor. We use planning poker — everybody estimates independently earlier than the reveal. When estimates diverge sharply, the dialog that follows is nearly at all times probably the most beneficial one of many dash. Not as a result of we’re negotiating a quantity. As a result of divergent estimates imply divergent psychological fashions. Somebody thinks this job is a 2. Another person thinks it’s a 13. That hole shouldn’t be a disagreement about effort. It’s proof that two individuals are not wanting on the similar drawback.

Divergent estimates don’t measure complexity. They measure the place your group’s understanding of the system breaks down.

The demos preserve everybody sincere about what was truly constructed versus what was supposed, cross-train the group throughout what every individual is engaged on, and provides the supervisor crucial sign of all: whether or not the individuals constructing the system can clarify what they constructed and why the tradeoffs they made have been proper.

An AI agent can produce a demo. It can not clarify its reasoning underneath questioning. The engineers who can are those you can’t afford to route round.

 Karpathy’s reframe from vibe coding to agentic engineering shouldn’t be a terminology replace. It’s a skilled obligation.

The organizations that ignore AI will fall behind. Those that vibe it can ship failure at scale. Those that engineer it — intentionally, with comprehension at each layer — are those constructing one thing price operating in manufacturing.

That isn’t a productiveness dialog. That could be a accountable AI dialog. The code seems to be completed. The pipeline is inexperienced. The PR is open.

Whether or not it’s truly prepared continues to be a human name. Make sure that your group — and also you — are shut sufficient to the work to make it.

Related articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Stay Connected

0FansLike
0FollowersFollow
0FollowersFollow
0SubscribersSubscribe

Latest posts