Jun 8, 20268 minAI Agents

AI Didn't Replace the Engineer. It Replaced the Typist.

Share
AI Didn't Replace the Engineer. It Replaced the Typist.

Writing code is automated. Engineering judgment is becoming the job, and the real risk nobody is pricing in is a broken junior pipeline.

Everyone at NY Tech Week is debating whether AI will replace software engineers. I think we are debating the wrong question.

I have lost count of how many NY Tech Week panels this week are some version of "will AI replace software engineers." It is the wrong question, and asking it on a loop hides the thing that actually changed.

The tools crossed a line this year

The coding tools crossed a line sometime in the last year. They stopped being autocomplete. Hand an agent a task today and it will read the whole codebase, stand up a test environment, run commands, fix what it just broke, and open a pull request touching a dozen files. That is junior-engineer behavior.

The numbers say the same. McKinsey puts the productivity gain on routine coding somewhere around 20 to 45 percent [1], and Uber says roughly a tenth of its committed code now comes from autonomous agents. Headcount is following the code. Block dropped from about ten thousand people to under six thousand while its gross profit went up, and said plainly that the goal was doing more with fewer.

And yet, the most rigorous experiment points the other way

The most rigorous experiment anyone has run on this points the other way. In mid-2025, METR ran a randomized controlled trial with experienced open-source developers working in their own repositories: 246 real tasks. When AI tools were allowed, the developers took 19 percent longer to finish. They had predicted a 24 percent speedup going in, and even after the slowdown they believed they had been 20 percent faster [2].

Slower, and they could not feel it.

A follow-up this February hints the newest tools may have flipped the result into a real speedup, maybe 18 percent for the returning participants, though the authors are careful about their small sample [3]. The tools are improving fast. Nobody's gut, mine included, is a reliable instrument for measuring them.

Fig. 1 — Perception vs. measurement in METR's randomized trial of AI coding tools Fig. 1: Perception versus measurement in METR's randomized trial of AI coding tools, showing the gap between predicted speedup, perceived speedup, and measured productivity. Sources: METR (2025); follow-up note, METR (2026).

So a machine can write working code now, with nobody typing it. That part is settled. But typing code was never the hard part of the job.

Code is local. Engineering is global.

Writing code is the local problem: make this function work, make this test pass. Engineering is the global problem. Why this architecture and not the other three. What happens when the thing serves ten million users instead of ten. Where the latency budget goes, and who gets paged at 3 a.m. when it does not hold.

Models are very good at the local problem now. They are still shaky on the global one, because the global one is not really about code. It is judgment under constraints you cannot fully type into a prompt: cost, power budgets, the business context nobody wrote down. That is most of what I teach my students at Columbia, and it is the part a clean generated function never touches.

The valuable skill moved. It used to be producing the code. Now it is deciding what to build, describing it precisely enough that a machine can run with it, and checking whether what comes back is actually correct. People skip that last part.

Verification is the new bottleneck

We talk about keeping a human in the loop like it is a courtesy we extend to the machine. It is an accountability requirement. The more an agent writes, the more work piles up on the other side: reviewing, testing, deciding whether to trust the thing before it ships. Somebody owns it when it breaks. That job grows as the agents improve, because there is simply more generated code to vouch for. If METR's developers misjudged their own speed by nearly 40 points, gut feel will not audit the code either. You have to measure.

What does the verification job look like in practice? Less like reading every line, more like knowing where generated code tends to lie. The failure I keep seeing is a generated function that runs cleanly and silently drops every row with a missing field, next to generated tests that pass because they were written by the same model with the same blind spot. A machine grading its own homework.

So I teach a short checklist for reviewing agent code:

  • Exercise the error paths nobody triggers.
  • Probe the boundary cases the prompt never mentioned.
  • Slow down on anything that touches money, time zones, or concurrency.
  • Never let the model's own tests be the judge of the model's own output.

The Jevons argument, again

There is a plain economic reason the engineers-are-finished story runs backwards. Economists call it the Jevons paradox. When something gets cheaper to do, we do not do less of it. We do much more. Cheaper steam power in the 1860s did not curb Britain's appetite for coal. It multiplied it, because coal was suddenly worth burning in places no one had bothered with. Software is the coal. Make it cheap to build and you do not need fewer people who build it well. You find a thousand new things worth building.

The tell is right in front of us. The same labs predicting the end of software engineering are hiring software engineers as fast as they can find them.

What I actually worry about

What I actually worry about is quieter, and almost nobody is pricing it in.

We will not run short on senior engineers because AI replaced them. We will run short because we stopped making them.

You get to senior by grinding through the unglamorous entry-level work: the small fixes, the boilerplate, the reviews that come back bleeding red, slowly absorbing why systems are shaped the way they are. Let agents eat all of that and the entry-level rung disappears.

So where does the senior engineer of 2032 come from? We are automating away the training ground that produces the exact judgment we just agreed is scarce. I do not have a tidy answer. I just think that is the argument worth having at these panels, instead of the one we keep having.

We will not run short on senior engineers because AI replaced them. We will run short because we stopped making them.

Should your kid still study computer science?

Yes. Not for the syntax, which the machine has handled. For the systems thinking, the habit of checking the work, the depth that comes from knowing one domain cold. The engineer who matters ten years from now is not the fastest typist in the room. She is the one who can hold the whole system in her head and tell you which parts to trust.


References

[1] McKinsey & Company. The economic potential of generative AI: The next productivity frontier," June 2023, with follow-up work through 2025. Reported productivity gains on routine coding tasks of roughly 20–45% across multiple controlled and field studies.

[2] METR. "Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity," July 2025. Pre-registered RCT with 16 experienced open-source developers, 246 real tasks: AI access produced a 19% slowdown despite a perceived 20% speedup; participants pre-trial expected a 24% speedup.

[3] METR. "We are Changing our Developer Productivity Experiment Design," February 2026. Follow-up note showing a smaller cohort of returning participants estimated at ~18% speedup with newer-generation coding tools. Authors caveat the sample size and recommend treating the directional flip as preliminary.

Enjoyed this post? Share it with your network.

Share

Discussion

Sign in with GitHub to leave a comment or react. Threads are public and live in this site's GitHub Discussions.