How AI reshapes the craft of software engineering, with Yoav Tzfati

How AI reshapes the craft of software engineering, with Yoav Tzfati
Transforming "I Can't Code" into "I Ship Software" with Yoav Tzfati

Patrick McKenzie is joined by AI researcher Yoav Tzfati to discuss vibe coding - using LLMs to delegate software engineering work to AI models. Yoav runs a bootcamp teaching programming novices to build full-stack web applications using AI, without them ever looking at code.

[Patrick notes: As always, there are some after-the-fact observations sprinkled into the transcript, set out in this format.]

Complex Systems now produces occasional video episodes. In addition to this site, you can access them directly on YouTube. My kids inform me that I’m supposed to tell you to like and subscribe.

Video Timestamps

(00:00) Intro
(01:57) The concept of vibe coding
(05:51) Exploring the practicalities of vibe coding
(11:46) Teaching AI-driven development to novices
(18:50) The future of AI in software engineering
(24:29) The future of learning and AI
(25:26) Timelines and model capabilities
(25:57) Flappy Bird and AI's progress
(01:23) The impact of LLMs on engineering
(28:33) The evolution of coding practices
(34:05) Monitoring and error handling with AI
(41:13) The future of software engineering roles
(48:27) Practical tips for engineers
(50:00) Wrap

Transcript:

[Patrick notes: At the time of recording this conversation with Yoav, I had not spent a material amount of time coding in the style described in this interview. I’ve now spent more than zero time, and been extremely impressed. I hope to show the Internet the fruits of that (which are more “a project” than “a company” in September).]

Patrick McKenzie: Hideho everybody. My name is Patrick McKenzie, better known as Patio11 on the Internet. I'm here with Yoav, who is an AI researcher who recently has been teaching people at a variety of skill levels this craft called "vibe coding." So Yoav, thanks for being on the program today.

Yoav Tzfati: Thanks for having me.

Defining vibe coding

Patrick McKenzie: So we'll start off with, first for people who aren't terminally online on Twitter, what does “vibe coding mean?

Yoav Tzfati: Vibe coding is a term that was coined recently by Andrej Karpathy. The way I've been thinking about it is basically delegating your software engineering work to an AI model. I find it really funny that this is the term that we've converged on for this. It's basically replacing a human software engineer with an AI model, but it has this cute term now.

I'm not the greatest fan of this term, candidly. And you mentioned before the program that you might not be as well.

The evolution of software engineering with LLMs

Patrick McKenzie: My reason is this is not the first time in the tech industry where we picked an auto-minimizing term for something which is much bigger than that term implies. When discussing with other friends who have arbitrarily high levels of skill, they've told me that they think LLMs fundamentally reshaped the craft of software engineering. And given that very credible people are telling me very credible things about productivity boosts and similar and the likely shape of things to come, I think my first pass would be using LLMs in the craft of software engineering or similar.

And we see this back to blogging. “Blogging” was for many years an excuse for people to devalue written output that was morally speaking an essay, but written by someone who is not serious about writing essays. And many people who are, morally speaking, bloggers are more careful about what they call that these days. [Patrick notes: People sometimes lament the “decline of” blogging, by which they mean many of the people whose written output they enjoy are now better employed and often quite professionally successful. As I’ve noted previously, Substack is an argument from capitalism that an intelligent person capable of clearly articulating a POV on a narrow subject of interest to humanity should not be employed as a PM at AppAmaGooBookSoft.]

I think early career engineering professionals who might be sort of anxious about their skill level and similar do not do themselves favors by saying: “My skill level is actually crap. I am the worst engineer in the history of 19-year-olds to have ever done the craft of engineering.” I would encourage people, particularly people early in their careers, to not call it “vibe coding” with respect to themselves.

But that is me off my soapbox.

Yoav Tzfati: That is a slightly different angle than I'm coming from. I don't think people are calling themselves vibe coders as a career title or something like that. I think it's more of an activity. People see it as an activity, and I think in some ways it sets the right mindset for using LLMs to produce software because often you'll get way better results if you constrain the model less and sort of let it be creative.

There is an exploratory, almost play-like feel to this, which sometimes there is and sometimes there is less of in the general craft of software engineering. The people that report poor results are the people that have this big existing code base and they need to make a very specific change to it. And to them, the vibes don't help at all. They just need to accomplish a specific thing. And the models are not as good at accomplishing a very specific thing that is still complicated compared to making a whole new thing from scratch that is less constrained.

Patrick McKenzie: I think the acceptance bars on that sort of greenfield development versus brownfield development are a little bit wider. Particularly if you're doing it more to learn the domain and to test out directions a program could go in, in sort of a prototype fashion versus "no, we have a system, we need to have a bulletproof bug-free implementation of one sub-module for the system or everything comes crashing down."

Practical applications of vibe coding

For people who are having a bit of trouble visualizing this, and I know that the answer is changing on almost a week by week basis—if I'm an engineer and I sit down at a computer and I am "vibe coding," what does that activity actually look like to me? What programs am I using? What does the information flow look like?

Yoav Tzfati: Yeah, so this has several different possible forms. There is the closest to normal software engineering version of this. Which is you have your choice of IDE open. Perhaps VS Code, maybe Cursor. And you are looking at all the code and you have a particular change you want to make in mind, but instead of typing it out yourself, you may be prompting a model to add a function or you even start typing it out and then let it autocomplete the rest. Or type out the function signature or whatever. Arguably, if you're looking at the code that is not vibe coding.

There's a step above that, which is still inside one of your IDEs of choice, most likely Cursor or Windsurf over VS Code. People have not had very good success with the agent mode in VS Code yet, as far as I know. But let's say in Cursor you have the agent mode sidebar. You type in, "Oh, I want you to implement blah, blah, blah," and you mention the feature set you want rather than the implementation you want. And then the agent gets to decide how to implement it. That's a little bit more along the curve.

Even more along the curve is, you know, you maybe still in these tools or maybe in Claude Code or something like that. You maybe even set up a completely new project, and you plan on not looking at the code at all whatsoever, and you say, "This is the application I want." And the model then goes and makes it. And you look at the result in your browser or whatever interface it is that you're making. Then you ask for feature changes.

[Patrick notes: Claude Code, asked for a spoiler-free recap of the last few dozen commits it and I have coauthored on my summer programming project, says “Over the past few dozen commits, we've collaboratively built the foundation for a Rails/Golang hybrid application. We started from scratch and established a complete Docker-based development environment, scaffolded a Rails application with proper structure, and created a Go service that runs alongside it.

Key accomplishments include setting up PostgreSQL, Redis, and Memcached integration, implementing health monitoring endpoints that verify all services are connected properly, and establishing clear documentation standards for future development. We also configured the infrastructure to support both web and API components running on different ports, with proper networking between containers.”

The work focused heavily on creating a solid foundation with proper separation of concerns, Docker containerization for consistent development environments, and comprehensive documentation to guide future development phases.“]

Yoav continues: And then the most vibe version is you use one of these web-based tools such as V0.dev or Bolt.new or Lovable. And inside your browser, you just tell it what you want until it makes it, you get to see it and request changes. You probably have a one-click deploy button where you can put it on the Internet.

Does that answer the question?

Patrick McKenzie: It does indeed. So we're starting at something which is almost clichéd in the discourse of LLMs now. “It's just slightly more sophisticated autocomplete.”

Although if folks are thinking of standard IDE autocomplete features, this is much more sophisticated than that. They are not simply, you know, guessing from the characters that you've typed recently or your most used library calls in a library. But they're actually able to see the context above and below the cursor. And so they are quite good at guessing, "okay, which library are they going to call next? What are the probable parameters they're going to send to it?" And even if there are patterns either in the LLM's training set or in the context window of, well, okay, if you're doing a for loop, the invocation line of the for loop is very predictable, but also if you've been programming for a while, sometimes you know that if you've just assigned a variable X and a variable Y, everything about the for loop is preordained and it's just a matter of banging the keyboard

And they are quite good at that in my experience.

And then I have not tried the agent-based mode, but Thomas Ptacek, my erstwhile co-founder—we were at Starfighter together, who's currently at Fly.io—recently wrote an essay about agent mode and how agent mode is, I think he would say it's a fair paraphrase, fundamentally changes the craft of software engineering.

[Patrick quotes Thomas: “All progress on LLMs could halt today, and LLMs would remain the 2nd most important thing to happen over the course of my career.” (I believe he thinks the mobile revolution was the most important.) ] 

Patrick continues: Agent mode is not dissimilar, say people I've talked to, to having a staff of junior programmers working for you where you might meet them before lunch, give them a number of "I would like you to bang out X, Y, and Z today" and have five, six, ten people that you farm things out to, and then you literally have to go away because the LLM does need a perceptible amount of time to think. So maybe you take a meeting, maybe you take lunch, whatever. You come back and there are five, six, ten pull requests from your staff of junior programmers. And some of them will be very "this person got exactly what I wanted from them," accept this pull request. Some of them will require a little bit of editing from you. You might not like the style, et cetera, et cetera. And then some of them are "Wow, you comprehensively misunderstood the task I was attempting to get you to do. I'll give you feedback on that misunderstanding and we'll meet back for your next pull request review in oh, 45 minutes."

Does this kind of match your experience of this?

Yoav Tzfati: Somewhat. I think that for most tasks and most people, the iteration loops are more like three minutes to seven minutes than 45 minutes. Now, with Claude Opus 4, for some tasks, the iteration loop can be a lot longer. They talked about how Opus went and worked for seven hours on a feature. I think the way most people use this is they usually have one agent running. Some power users have multiple, but they will look at its work and see the changes it's proposing, and over the course of two or three minutes, let it work and then give it feedback and so on. Maybe go back in time to a previous checkpoint.

Teaching vibe coding to novices

That said, the thing that I was optimizing for in my bootcamp—have we mentioned the bootcamp yet? We haven't mentioned the bootcamp. Oh, we should probably mention that context for people before we go too much on—

Patrick McKenzie: So you have been recently teaching people who are earlier in their engineering careers about how to use this as an accelerant to producing commercially relevant code.

Yoav Tzfati: Almost. Rather than early in their engineering career, I would say complete novice, have never coded in their life was my target audience. I had a couple people that have coded in the past, but for the purposes of the three-day bootcamp, they were not looking at the code at all. And rather than thinking of it as sort of upskilling in engineering, I was thinking of it as providing people with the bare minimum skill that they need in order to produce working software without needing to know software.

And some of the theory here was I actually started planning this before Claude 4 came out, and I was anticipating Claude 4 is going to come out and just hit the threshold of making this possible. And it was perfect timing. Basically all of my students produced working full-stack web applications, which I think would not have been possible with any earlier model.

Patrick McKenzie: So I think we have a range of technical sophistication among our audience here. So finger to the wind, a working full-stack web application is something which many people would find difficult to pull off without substantial guidance immediately after a four-year undergrad degree in computer science.

[Patrick notes: Professionals who do not have the benefit of a degree whose sticker price was six figures sometimes express surprise that I graduated unable to write SQL. There was an elective offered, but it wasn’t treated as a particularly important one by the department. I learned on the job, as most do.] 

Patrick continues: And there are some people who go their entire careers without being more than minimally competent at either the backend or the frontend—the backend being this thing on the server that talks to the database and runs business logic, typically the frontend being the thing that runs in the browser [Patrick notes: or I suppose you could write apps, if you haven’t been scarred for life by downloadable software] and does fun animations and connects to remote APIs, maybe connects to the backend and similar. And so there are people who have worked in Silicon Valley and earned a nice paycheck and would say, "I am not more than minimally competent at either the backend or the frontend," depending on which I've been working on.

And so shipped a full-stack web application within one day of encountering the notion of "there is code in the world" is kind of wild.

Yoav Tzfati: It's incredible. I was surprised to see how well it worked. My students were certainly surprised. I think that some of what is enabling this, you know, there's the trajectory of AI that is going exponential. There's also the trajectory of developer tools that has been quietly increasing over the past, you know, since software started.

Patrick McKenzie: The younger listeners to this podcast might not know this, but there was a life before Git and GitHub. And let me tell you, it was terrible.

[Patrick notes: I’ve mostly blocked Subversion from my memory, thankfully, but it was such a pain in the keister that many shops had specific engineers authorized to use it. This was an improvement on the status quo, which was not using source control. That practice was widespread until late in the 2000s, so much so that using source control was a marker of a cutting edge software practice in the early 2000s.]

Patrick continues: And affordances by Git—you mentioned, you know, rolling back Git history to a previous checkpoint. I assume that's a Git-based affordance in Cursor?

Yoav Tzfati: No. In Cursor, they have sort of IDE checkpoints. We were not using Cursor, we were using Claude as the main interface to Claude. And Claude doesn't have that functionality, but we did use Git for that.

Patrick McKenzie: Got it. Git is also one of those forbidden high gates you need to pass to get into productive use of, or productively functioning as a software engineer in industry, for example, and presumably having attempted to teach people this before, if you can make someone minimally competent in Git in a day without them having previous version control experience, you're doing pretty well if that is your only pedagogical goal. And so if you have people successfully being able to commit and do commits, merge and rebase and Git at the end of 24 hours on top of learning programming on the same day, that is again, kind of wild to me.

[Patrick notes: Interestingly, having played with it a bit, I think the LLMs do extremely well with translating natural language into deep in the 90%s of my historical git usage, including some incantations which are not quite in my muscle memory. (Even if you don’t use Claude Code or similar, I strongly suggest getting Simon Willison’s llm command installed. Drop in an OpenAI or Claude API key and it becomes an in-terminal answer box which is magical for e.g. looking up syntax to git or any other Linux/etc command. You can pipe output to it naturally! Or ask it to explain how to do git bisect.

 ] 

Patrick continues: So this is a nice segue into one of the main underlying philosophies that went into this bootcamp.

Yoav Tzfati: I saw one of my main roles here as preparing resources for Claude such that my students are just sort of a channel for that. Such that my students will be able to just talk to Claude very naturally about what they want to happen, and Claude will then execute.

None of my students typed into their command line "git commit" or "git add" or whatever. They all had the experience of telling Claude, "Oh, that's great. I want to deploy it to the Internet," or "I like the previous version more. Can we go back?" And Claude had instructions from me on, you know, committing frequently and what to do when the user requests to go back to a checkpoint and stuff like that. What to do to squash before pushing—some technical jargon.

Patrick McKenzie: And this is a thing that we do in the industry and have for a very long time. There are low-level technologies and Git is not quite the lowest of the low-level technologies, but oh, my Linus Torvalds, when he goes off, he goes off. 

So there are things that one builds on top of Git, like GitHub, for example, to make things easier for early career engineers. This helps in reasoning around what it's doing and having social norms within a company or social norms within an open source project to make people's use of Git cross-compatible with each other. Because if you have vastly different norms for how much work there is in a commit or how much work there is in a pull review, there will be some friction on the engineering team.

And so instead of having an orientation where we teach people for two weeks, "this is the way we do engineering at this company." Given you only have three days and there's no company involved other than your own, you can pass that pre-orientation in a very natural fashion over to Claude, and then Claude kind of transforms people's expressed intent into, "Okay, if I was working on this hypothetical engineering team, the way that this team would prefer expressing that intent is taking the following actions in a terminal."

Yoav Tzfati: Yeah, I think that's completely correct. And also goes for any other kind of code-specific stylistic choices or best practices in your team.

That said, my problem here and my students' problems were a lot more narrow because the thing I'm going for here, and the thing I think that I sort of uniquely provided is more in the direction of personal software, empowering people to just build small bespoke applications for whatever use case they might have. I had a video producer that has never coded in his life and had a little bit of computer phobia. And he built an application that takes in a screenplay and analyzes it, splits it into shots, and generates an AI-generated storyboard for it, which I thought was incredible. He said he might actually use it for his films.

Patrick McKenzie: That is a wonderful example of computers making people's life better in a way which I happen to know a bit about. There are a couple of screenplay writing software programs out there that are used in industry and similar. But if you either don't have that budget or you "where's my AI storyboard feature? I want that now rather than three years from now when one of those companies catches up to it."

That's the sort of itch that is much easier to scratch if you're a programmer, but finger to the wind, I have an engineering degree. I've shipped a SaaS company or two in my day. That's a two-week project for me, I think, finger to the wind. And it is, you know, there's some level of projects that are over that line and some level which are under it. And if you can give people who might be in an allied trade or something entirely different walk of life than programming, "hey, you know, the computer can do this for you if you spend 45 minutes describing what you want." I think that increases the amount of bespoke software in the world.

There are some people in less temperate parts of Twitter and similar who think that software as a service is dead. Everyone will write their own software every day, which is of course exactly what we found after we released word processors and then everyone wrote all their own things rather than getting other people to write things for them. I'm joking a little bit, but I do think that is an important point. [Patrick notes: Remember the first time we replaced programmers with a program? It was called “a compiler” and did not successfully end engineering employment. Rather the opposite.] 

Speaking of important points, I think while this is extremely early for people on literally day one, as you've mentioned, it likely continues through the rest of their engineering journey. And it is difficult for me to envision a world in which professional engineers are not using these tools constantly going forward. Vibe check me on that statement. Does that sound accurate to you or do you think actually no, I could imagine engineers who would just get no benefit out of this whatsoever?

Future of AI in software development

Yoav Tzfati: I think there will be a wide range of usefulness for different tasks and different contexts. But, you know, I believe that AI will become generally intelligent or superintelligent in the next less than a decade, maybe less than five years, and that just straightforwardly means that eventually humans will not write software unless they are doing it for fun. Where that is what the trajectory looks like. I sort of expect that a very large percentage of application development will be completely done by AI within a year and a half. Something like that. And then I think other more complicated things like maybe systems software that is very low-level and complicated, maybe Linux kernel development or underlying C++ libraries for AI inference or stuff like that. That sort of thing I think will follow relatively shortly after, but we'll take a little bit more time.

Patrick McKenzie: I broadly speaking have longer timeframes than you, but you have probably thought about that specific question a lot more than I do, and so I don't think it's very productive for me to say reasons why I think it'll take longer. But the thing I think there would be wide agreement on is that given that these tools are getting so much better on every three to six month-ish cadence, where you said, it would've been borderline to even attempt to do this for students at the skill level even a few months ago, and then Claude 4 gets you there. If people have previously tried using LLMs to assist them in engineering, or if you're unsure "am I doing the kind of engineering that would really benefit from putting points into learning how to do this for a couple of weeks?" You might as well deploy those points now because if you are expecting a productivity boost from your use of LLMs, if it isn't available in June 2025, just check back in October. And you know, whatever your expenditure of points and resources and investment was in learning how to do this, the LLMs have in aggregate expended a lot more training runs on getting better how to do it between now and October and suddenly, boom, your investment that was not quite where you want it to be now gets you some rewards that you can take advantage of.

Yoav Tzfati: Yes, I agree. That said, I think that learning this will become easier over time as well. It's never going to be "too late to get into it and you'll be behind" or something. With the models getting smarter, they will also become better at teaching you how to do this. Right. And there will be better UI affordances built on top of them and you know, better educational products like your own, built around it, et cetera, et cetera.

Patrick McKenzie: And then the models themselves and the people that are building the systems that are firing prompts off to the models will have more experience of seeing engineers over a range of skill levels on a wide variety of problems and they'll presumably tight loop that into either training runs or fine-tuning runs to make them function better out of the box, as it were.

Discussing timelines and model capabilities

Yoav Tzfati: Yeah, absolutely. And regarding timelines, I actually would be happy to talk about that if that's interesting because one of the reasons that I decided to do this is that, you know, I believe that the models are much smarter than people seem to think. And I wanted to show that, and I think I've actually done a good job. My students were able to build really impressive things in a very short timeline.

Flappy Bird and the evolution of game development

Patrick McKenzie: I remember being wrong about this, and so I will cop to being wrong about it. A few years ago when Flappy Bird came out—and Flappy Bird, for those who don't remember, was an extremely minimalistic game, which was nonetheless done well. For a while it was a bit of an Internet sensation, partly because there was a bit of a gap between how minimalistic it was and how just fun it was to play. And that gap in particular caused people to want to play it. And then Internet sensation begets being an Internet sensation.

Someone said, and I can't remember who it was, this would've been back in, I didn't look this up beforehand, it would've been impressionistic in the 2016 era or so before the modern era of LLMs. They said, "Eventually you'll be able to go to a computer and just say, 'make me a game about birds' and this is going to pop out of it." And I was like, "No, no. Just from an information theory perspective, there's not enough in the sentence 'make a game about birds' for this to pop up out of it."

Today, I don't think it's even worth doing the experiment. If you say "make me a Flappy Bird clone, except it's gotta involve sharks and water," you will get a functioning Flappy Bird clone with sharks and water basically on your first try.

Yoav Tzfati: My students basically did this.

Patrick McKenzie: Yeah, of course. The natural progression of programs over the years—in my day, Hello World. These days, please make me a Flappy Bird clone.

The impact of LLMs on software engineering

If you take away nothing other than this from this episode, understand that the LLMs, if you do not currently think LLMs are intelligent, if you think they're a fun bit of magic, but won't have any major impact on the world. If you think that, "okay, cool thing that you can use to generate slop for spam purposes, but we'll never transform a white collar occupation." This is two relatively experienced professionals trying to shake you by the neck and say, "No, no. We are not talking about two years from now. We're talking about today in terms of the things they can do and have a variety of confidence intervals for where we'll be in two years."

I won't ask you to endorse this, but I've tried to tell people I am the guy who thinks that the bear case for LLMs is that they're only as impactful on human society as the Internet was. Simultaneously, I'm the guy who says that the Internet is the most important thing that humanity has ever accomplished, and the bull case for LLMs is much more impactful than the Internet.

And paraphrasing what you said earlier, and please tell me if this is an unfair paraphrase. You're expecting the literal end of work, at least for productivity purposes by humans, that we can offload entirely all or almost all of that to LLMs.

Future of coding and human roles

Yoav Tzfati: Yeah, basically within a few years, I expect humans with very few exceptions will not look at code, maybe only audits, very specific code bases that actually need human auditing because they are, I don't know, relevant to AI safety usage of AI and then you have to have a human look at it because you don't trust the AI or something like that.

Patrick McKenzie: And this prediction sounds a bit more radical than it is I think because there are—when we do code, we typically do it in expressive, high-level languages. Ruby, Python. What's your poison of choice?

Yoav Tzfati: It depends on the use case, but for most applications, TypeScript these days.

Patrick McKenzie: TypeScript. So TypeScript is a great example. TypeScript is built on top of JavaScript with quite a lot of similarities, but files some of the rough edges off. JavaScript is built on a virtual machine. The virtual machine is built on top of lower-level, high-level languages, probably C. C is built on top of assembly. Assembly compiles down into binary code and the CPU, the chip that runs in your computer that does all this stuff interacts essentially with binary code.

Now for that entire tool chain, there are currently people who are specialized in working in each of those elements of the tool chain, but you can go your entire career after undergrad in Silicon Valley and working in frontend engineering, backend engineering, FinTech, whatever it is that you do, and never see a single line of assembly code. And no one around you will say, "Wow, this person is not an engineer. They've never seen assembly code." People will say that they're idiots, ignore them. You can go an entire career without seeing C, you can go—there are people who for considered choices of their own, choose to go an entire career without ever seeing JavaScript. And you know, partly this is an argument from specialization because humans want to accomplish a lot of things in the world, can't be expert at all the things we offload, some of the things to programs that were previously written or other humans or society generally. Partly it's an argument for, we have these increasing layers of abstractions built on the technology, which we had figured out a long time ago that technology continues to improve due to the results of specialists. But you have to think about it much less than you do. Back when there was a point where essentially all programmers working in the world knew assembly because that was your only option.

Yoav Tzfati: So I gave a very strong version of the claim earlier, the claim that engineers will not be looking at code anymore. I could quibble with it a little bit, but the notion that many engineers are looking at code a lot less than they do, seems straightforwardly obvious to me, given the fullness of time. I think exactly what year that'll come by is up for grabs. But yeah, and I'm personally looking at less and less code as I'm going. For the most complex things that I still do, I will look at all of the code, but increasingly for a lot of stuff I'll just check if it works and if it works, I won't look at the code.

So this is already happening.

Patrick McKenzie: I will say as I get older, I look at less and less code is something that many engineers would've reported prior to LLMs existing. Partly it's the industrial organization of software where the more senior employees, their time is more valuable. They end up doing things that are very leveraged. A lot of them are interfacing within the engineering organization and with other stakeholders in their organization, perhaps the clients, perhaps customers, on what should this do anyway, and then transforming that into a bunch of prompts, if you will, for more junior engineers. Then, "Okay, here are things we need to get banged out in the next two weeks."

And then reviewing the output of the engineers that are a variety of levels. And one of those levels could be, you know, line by line level code reviews. But as the person who is doing the work for you gets more senior, it is less valuable for you and for them to do line by line code reviews and more valuable to talk about okay, on the level of design document, on the level of microservice, on the level of et cetera, et cetera. Let's have a meeting of the minds on our goals. And then success or failure will probably not be read out of an IDE. It'll be read out of things like, "Okay, well the alerting system isn't going crazy. So that's good news. And we had a successful green blue deploy. That's good news." And, you know, continue on a well-designed process for green blue deploys. If you're getting no errors after the thing is 100% in production, then very probably you never need to go back and say, "Okay, but was that code that they gave me, did it work? Really?" And so in a certain way, the future might look like a turbocharged version of the past where as one gets more advanced in one's career slash the technical substrate one works on gets more advanced. You need to poke into the underlying layers less and less.

Monitoring and error handling in software

Yoav Tzfati: Yeah, I think that's a great take. I do think that, you know, investing in very good monitoring has always been important, but it might become even more important now knowing when something is going wrong.

Patrick McKenzie: I think something that has been reported to me privately, but I'm interested if you've used it. So monitoring broadly—computer programs have logs to them. There's also metrics you can collect from a computer system or computer program or similar. Monitoring is just making these visible to typically human operators to allow the human operators to make decisions based on them. And so a thing you might do if, I don't know, a large portion of financial industry computers are down overnight is to wake people up because them being down is typically bad. And you won't know that they're down unless either someone's yelling at you or you have someone who is in some way tied to a computer system that can realize, "Okay, if the number of requests that we get per minute goes down from a hundred thousand to 40, that's not good news. Tell an engineer that and have them make a decision on whether to wake people up." That was just the context on what monitoring is.

So typically monitoring is not reporting errors on a one by one basis and filing them all to an intermediate software engineer for, "Okay, find immediately what caused this error number 3,732 that we experienced today?" Typically, anomalies are interesting and individual errors, less so because there is a finite amount of attention that intermediate engineers have, but it seems like attention attached to near-human intelligence might be very abundant in the future.

The role of LLMs in code review and maintenance

And so one of my friends who has previously done a lot of work has said, one of the things that we do these days is just pass every error and every error message to an LLM and see if they can identify what caused it. And if yes, add a test in the code so it doesn't cause again, and add a code change that will fix that error in perpetuity. And then once an error—so we have engineers review the proposed code changes and batch accept them, and I don't know if his company has yet moved past the batch accept to the, "Okay, computer. Just figure it out and we'll tell you if we don't like it later," but I suspect not very far away.

Are you passing logs over to the LLMs yet for somewhat autonomous mode or—

Yoav Tzfati: I have not done this yet. But also I've been focused on dogfooding a lot of my content for, you know, creating these greenfield small applications. That does sound very useful to me. And I do think people will remain nervous about auto-accepting code into their code bases run by LLMs for at least a few more months.

Patrick McKenzie: I expect that is going to be a Rubicon for a number of organizations, partly for one of the reasons, and I apologize, I don't want to drag you into it, but one of the reasons why I somewhat fade the curves that people in the AI safety and similar community have for impact on human life is that in some places they're not rate limited by things that the AI or any advocate for the AI can dream up. They're rate limited by acceptance into existing organizations, institutions, and similar. And so just like it took an awful long time—decades in some circumstances—to get people comfortable with the notion of your business's valuable data should totally sit in somebody else's building. You won't even be allowed into the room with it anymore, but that's better for you than your current practice, which was the story of cloud computing on which Amazon et al spent decades. Again, getting companies comfortable with, I think it will take a long time to get conservative engineers, conservative institutions over the hump of, "No, really just fire unaudited changes into your code base. What's the worst thing that can happen?"

And similarly, you know, more aggressive end of the spectrum where people are—the sardonic phrase sometimes used in Silicon Valley is "cat photos." If you're only shifting cat photos around, the relative robustness of the engineering demands need are somewhat less. I think interestingly, if you look at the financial industry and look at the company that is most famous for sending cat photos around, the relative robustness that the cat photo company is actually much higher than the mean robustness in the financial industry. That's neither here nor there. But I will say, again, from private conversations, it seems like people are already doing the LLM to patch to batch adoption cycle in production or very close to production at "serious companies" doing serious things.

Yoav Tzfati: Yeah. And there's a bunch of topics we can branch into, but I told my students, "Wow, this is great. You can make these web apps, put them on the Internet and you didn't look at the code once. That's awesome." A lot of them got nervous around, "Oh, will I suddenly be charged a lot of money? Or you know, will I leak user data or something like that." And what I told them is, once your application looks the way you wanted and does the thing you want it to, you should probably get a human software engineer to audit the code base, which I think is an interesting flow. You can generate hundreds of small bespoke apps and then a small fraction of them will actually become "serious." And for those, you do the auditing.

Wireframing and project management with LLMs

Patrick McKenzie: So one way to think about this is turbocharged wireframing, where in the classical craft, if you're at a company and you are trying to decide on what the new feature for the product looks like, you might have a team go off for a week or two with a sketch board and do "wireframes" of, "let's get the design working here and understand the flow, the screens that the user interacts with to get through the task in front of them." And then you might go as far as having a sub-team of that team build something which is clickable, but which can't actually do the thing yet, just so it's a communication aid with your engineers that review and the actual implementation. "Here's the thing we want you to build here, is broadly what we expect the output to be." But we haven't fully specified the output because if a wireframe was fully specified, it would just be the program. And so here we have wireframes, which are in some sense communications aid, in some sense an exploratory tool, which are just much higher bandwidth than traditional wireframes and happen to be executable.

Partly it'll be a matter of tempering expectations. And you know, maybe as of June 2025, don't vibe code a bank. Your regulator might not be super happy with that decision. Maybe if you are thinking of vibe coding a bank and don't yet know that banks have regulators, ask an LLM close to you about that topic and you'll learn some things.

The future of software engineering careers

I feel less threatened about it than some people do. And for whatever reason, a combination of things I've written and my position in the community, I meet many people who are early in their careers who have said things like, "I think this is going—the broad increase in skill levels and increasing adoption in the craft of software engineering—I think this is going to greatly decrease demand for engineers in the future. I think it might decrease salary for engineers. I think the door might have closed that people who are in the industry already are fine, but I'm not quite in the industry yet in a very bad way."

I'm broadly optimistic about all of those questions. In most worlds that I envision, the total software employment does not decrease. And my basic model and intuition for that is if you make engineers a hundred times more effective, then we don't need a hundred times fewer engineers. We just increased radically the number of value-adding projects we can go after.

And I think people might not understand this about the largest engineering firms in the world, but they have a list every quarter that they go through. And the exact nomenclature differs in different places. But here are all the things we could do if we had infinite time and then we draw a red line somewhere down that list where the things above the red line are things that we—projects we think we can actually staff this quarter given our current size of the engineering team, other constraints, desire for maintenance programming and similar. And then there's always things below the line. Almost everything is below the line. And given humanity's ability to desire new things at once, every time we alleviate some pockets of scarcity, I think there will always be something below the line. And so essentially always the demand for engineers.

Yoav Tzfati: I do like that intuition and framing, but I find it interesting to talk about what the humans are doing. And you mentioned earlier about the role of the software engineer, of the senior engineer, sorry, compared to the junior engineer. And there's the product manager.

And the frame that I've been shifting into and that I used for this bootcamp is senior software engineers often do things that kind of leverage the work of the junior engineers to be more effective and help them make fewer mistakes. If you set up your code base such that when a junior engineer makes a mistake, there is a linter that catches that mistake or a type checker, or basically a program that you run your code through and it checks it. Then all of a sudden you're more likely to get working software out of your junior engineer.

And the framing that I took on for this bootcamp was, "Okay, I'm sort of this senior engineer and I'm going to set up a project template that has all of the necessary pieces to let Claude as the sort of junior engineer be able to work very productively and with minimal errors while my students or the people, you know, with the idea for what to build, with the idea for what they want, acting sort of like the product manager."

And this was extremely effective. And I think when people see the level of, or try to estimate the level of capability from these models, they are looking at, "Okay, what happens if I ask the model to from scratch, build this sort of whatever application." And the model often gets it somewhat right, but makes some simple mistakes and it's, you know, not very polished and you would have to invest senior human engineering time into doing the last mile, sort of. But there's a large cost associated with engineers just reading code de novo and understanding what is this thing doing anyway.

But my approach here was, "Okay, what if I did the last mile at the start?" And set up a project with all of the necessary dependencies. For each dependency, I asked Claude, "Hey, what is everything you know about this dependency? Okay, great. Now go look at the docs on the Internet and tell me everything you missed. Now summarize everything you missed so that I can put it in the instructions for this code base." All sorts of those types of things. What backend technologies can I choose such that building out new features is extremely easy, which is a topic that I've been excited about, you know, before the LLMs.

And so when you say something like, "Our appetite for more software will just increase and we will build a lot more software, but we will still need humans," like, you know, humans will be X times more effective. Then I'm going, "Okay, but they're not going to be doing the same task." There's going to be humans that are acting sort of like product managers and maybe there are humans that act as sort of auditors for untrusted, LLM-generated code—code where you don't trust LLMs enough to comply with the bank regulations or whatever.

And at some point, you know, I think that once the trust barrier is passed, like we have this aligned AGI or whatever, at that point, you no longer need the trust senior software engineers like humans because you trust the LLM to do that. And I don't know if we ever reach that point because it's a technical alignment question combined with a psychological and social question.

Patrick McKenzie: There are indeed a lot of things that computers can do these days trivially which society for whatever reason, does not trust computers to do. An example I can think of off the top of my head—adding two numbers and then comparing that number to a third number and saying, "Is it above this or not?" Again, trivial for computers to do this for almost all values of numbers. There are, if you are attempting to get accreditation as an accredited investor in the United States to invest in a startup, you really want the agent in the world that is doing the "I'm going to add two numbers together and compare against a target value" to be a lawyer or a CPA because they have the magical ability under law to give you the stamp that you need. And a computer, even though it is capable of doing the math, does not.

[Patrick notes: VerifyInvestor is a beautiful small business. It’s essentially a basic data entry and upload script that wraps a small roster of lawyers who are willing to quickly log in in response to emails and click “Yes, $4.7 million is more than $1 million. I have a juris doctorate and am licensed to practice in California. Clicking this button probably just earned me $25 of the $70 that the investor or company/fund will pay for individualized legal advice.”

I am not sarcastic; that service has improved my life and was worth what I paid for it on over five occasions.] 

Practical tips for software engineers

Patrick continues: You mentioned linters and one thing, like pro tip for anybody who is quite experienced with doing software engineering but hasn't used these yet. One of the best uses of them I found in my limited exploration is, given that we already have a linter in the code base that is flagging code patterns that have caused us problems in the past or which are aesthetically unpleasing to the team. And we simply don't want, if you have an error, ask the LLM, "Could you write a linter rule that would've caught this?" And then you can visually inspect that linter rule and say, "Okay, do I like this or not? Is it making up something or, you know, will this have sufficient coverage, et cetera, et cetera." You can ask the LLM to run it against the code base and tell you what it gets from the coverage reports and how many instances it flags. And then you can, you know, go one step further and say, "Okay, for all the flagged instances, rank me a pull request with a patch for it." And then, you know, your choice as the person who determines allocation of resources is whether you want to actually review 67 pull requests as a matter of priority today. Or just like, "Okay, these are the sort of pull requests I would expect given this rule and nothing's broken right now, maybe I won't spend the time to look through all 67 of them, but I'll take that linter rule in the future just to flag for the benefit of other junior software engineers and junior LLMs that if your code hits this rule, try again."

So one bit, a tiny bit of alpha there. We've been chatting for a little while. I think we could go in many, many directions with this, but where can people follow you on the Internet?

Yoav Tzfati: Probably best right now is Twitter. Yoav is my handle at @yoavtzfati. And if you're interested in the resources that I made for this bootcamp and plan on maintaining it, that'll probably look different in a month or whatever. Whenever this is released, you can go to Code-Bloom.app.

Patrick McKenzie: Thanks very much for coming on the program, Yoav, and for the rest of you, thanks very much and we'll see you next week on Complex Systems.

Yoav Tzfati: Thanks for having me.