Ep. #9, The AI Coding Paradigm Shift with Simon Willison

The Story

This episode is Joe Russo talking with Simon Willison, and the conversation stays grounded the whole way through. Simon explains why he has become such a steady voice in AI: he mostly ignores the grand predictions and pays attention to what the models can actually do right now. That stance comes out of a long habit more than any master plan. He has been blogging since the early 2000s, first about web development and Django, and then, almost by accident, he was already paying close attention to generative AI when ChatGPT landed. He points to Stable Diffusion and GPT-3 as the setup, and says ChatGPT was basically the same underlying capability wrapped in an interface normal people could use.

From there the discussion shifts to software, which is where both of them clearly feel the ground moving. Simon says the debate over whether models are good at code is over. In his view, late 2025 was the tipping point when coding agents stopped being a novelty and became reliable enough to use every day. He says many engineers he knows now have agents writing most of their code, and that would have sounded absurd a year earlier.

But he is careful about what that means. He draws a hard line between "vibe coding" for personal, low-stakes tools and using these systems in production. If the bug only hurts you, fine. If it affects users, money, or security, the standard has to be higher. What changed for him is not that review stopped mattering, but that he no longer thinks every line must be read by a human to be responsibly shipped. He compares agent-written code to an internal service built by another team: you trust it through tests, docs, behavior, and reputation, and only dig into the guts when something goes wrong.

That opens a bigger question Joe keeps pulling on: if software teams can produce far more code than before, what breaks next? Simon's answer is basically everything around the code. Review, product design, specs, and management processes were all built for a world where implementation was the expensive part. If implementation gets cheap, teams can try more ideas, recover from mistakes faster, and probably rethink a lot of ceremony that used to make sense.

The episode ends in a more human place. They talk about beginners, mentorship, and whether the next generation can become strong engineers without the long pre-AI apprenticeship older engineers went through. Simon is hopeful, mainly because these tools remove so much early misery from learning to code. He also thinks software engineering is still hard in the same old way: someone still has to translate messy human needs into systems that work.

Main Themes

The main thread running through the episode is that AI changes the pace of software work without changing its difficulty. Simon keeps coming back to that point. Software was already hard, and the hard parts were never just typing. They were judgment, scope, security, tradeoffs, and understanding what actually matters. The tools speed up people who already have those instincts, which is why he calls them amplifiers of experience.

Another big theme is trust. Not blind trust in models, but a changing standard for what trustworthy work looks like. In the past, a tidy repo with tests and a polished readme suggested care. Now those signals are cheap. Simon says actual use matters more. A tool that has been exercised in the real world tells you more than one that merely looks finished. That same idea extends to enterprise software, where buyers do not want novelty as much as proof that something works.

They also spend time on who benefits from all this. Simon thinks experienced engineers are in a strong position because they can steer the tools well, but he does not think that means newcomers are doomed. He sees real value in AI as a patient partner that removes friction and helps people get past the first ugly months of learning. The open question is whether that kind of assisted entry produces engineers with depth over time.

Underneath all of it is a pretty plain argument: the winners here will not be the people making the loudest predictions. They will be the people paying attention to what works, where it fails, and what has to change around it.

These things are amplifiers of existing experience: if you know what you’re doing, you can run so much faster with them, but producing software is still ferociously difficult. — From the episode

Full Transcript

Source: openai 53m runtime

I'm constantly reminded as I work with these tools how hard the thing that we do is, like producing software is a ferociously difficult thing to do. And you could give me all of the AI tools in the world and it's still really difficult what we're trying to achieve here. There are a whole bunch of reasons I'm not scared that my career as a software engineer is over now that computers can write their own code. These things are amplifiers of existing experience. If you know what you're doing, you can run so much faster with them. Hi, I'm Joe Russo, a general partner at Heavybit, and you're listening to High Leverage, a regular series where I explore how modern software teams are scaling through better culture, tooling, and processes. High Leverage is brought to you by Heavybit, the leading investor in enterprise infrastructure software. To learn more about us, visit heavybit.com. Welcome back, everyone, to another episode of High Leverage. I'm super excited to be joined today by someone who needs no introduction, I believe, but I'll do so anyways, Simon Wilson. Thanks, Simon. Great, I'm really excited to talk to you. Yeah, I just want to start before we get into it by just saying how much I've appreciated, and I'm sure many of the listeners here, all the work you've been doing the last few years providing what I think is just like a very needed and balanced perspective on what's actually happening in AI from week to week and month to month. Yeah, I realized the thing I picked for myself, I decided that my beat in all of this, my journalistic beat was going to be what can it do because there's so much to talk about around what could it do and what are people saying and all of that. And that's fine, but if you stick to just what actually works today, there's such a rich vein of information to dive into and explain to people. And that's just super useful. Nobody can complain of being updated on what the new models can actually do right now. So that's what I've been focusing on. Right, which I think really comes through. And yeah, like you're saying, the value, there's just so much and understandable. I mean, we're in arguably the largest hype cycle of my career. Yeah. Which is not short at this point, or it's longer than I care to admit. But yeah, I think constantly finding yourself usually set between thought leaders who are, hey, this is some nirvana or doomerism on the other side. There's the position right in the middle, which very few people are positioned in, which is surprising to me. Like the middle is great. Everyone should come and hang out here with me. Yes, yes. Well, that's definitely our goal is to get more people there. Maybe then really quick, just your origin story. How did you come to occupy this position in the middle? What led you to pre-AI and then to there? Honestly, all of my influence in the AI industry comes from the fact that I have a blog and I frequently update my blog. Like, basically, I'm treating it like it's 2005, back at the sort of like the peak of tech blogging before everyone moved to short-form media and then LinkedIn and TikTok and all of those kinds of things. So I have a blog and I update my blog with what I'm thinking about and what I'm discovering and what I'm finding out is interesting. And I've been doing that, yeah, since 2002, originally for web development and CSS and JavaScript and then Python. And I co-created the Django web framework quite early in my career, so a lot of my blog was talking about Django-related stuff. And then about three years ago, it was stable diffusion actually was the thing that got me into this because I teamed up with Andy Bio. We were looking at stable diffusion and they released the training data. And so we went and dug into the training data and discovered that they had scraped it all from news websites and Pinterest and places, which we thought was interesting. You know, we thought people might want to know that these models are being trained on all of this unlicensed scraped data. So we built a little Explorer tool that people could use to dig into the training data and wrote a few things about it. And then ChatGPT came out, I think, two months after that. So I was already thinking about that space. And I'd been playing with GPT-3 prior to ChatGPT for about a year and a half and it was really interesting and it was very difficult to get anyone else to care about it because the way GPT-3 works was, it was the text completion thing. You have to type a JavaScript function that does X, Y, and Z is colon, and then it would write you the JavaScript function. And that was just weird, right? It was a weird way of interacting with this technology. So I could tell us there was something there, but the UI for it, the way you interacted just didn't quite fit most people's brains. And all ChatGPT was, was GPT-3 with a chat interface baked over the top. Like it was just a little, basically a little UI hack, which opened everything up because once people could actually talk to the thing in a conversation, what it could do became so obvious. So I was in the right place at the right time in that I'd already been building up a little bit of sort of material around GPT-3 and how that works. And then ChatGPT came along and I was ready to start writing about that as well. And then really I've just been following the sort of riding the wave since then, writing up the new models as they come along. I've got my ridiculous pelican riding a bicycle benchmark that I've been using, which some people appear to take seriously and I certainly don't myself. And yeah, and I've been doing annual write-ups. At the end of each year, I do a write-up of the things that we've learned about LLMs in the previous year. I've got the monthly newsletter, the weekly newsletter, all of that kind of stuff. So yeah, it's just an endless deluge of interesting things. So the challenge for me is always carving out time to do actual work as opposed to being distracted by Mistral Medium. Mistral Medium 3.5 came out two hours ago. So, you know, every day, every day, something new happens. Yeah, I mean, we were originally scheduled to record this podcast late last week. I had some network troubles and I was just thinking in the four or five days that's passed since that original day, I mean, Poolside models came out yesterday. Deep Sea V4, I think, maybe came in the last four days. Yeah, it's incredible, even at this point in how quickly things are still changing. You know, the firm's focus, I think a lot of our audience's focus is building better software. And the name of the podcast, High Leverage, comes from, it's predated LMs, which is generally speaking, as software engineers, how can we gain more leverage out of better practices and techniques? And I assume your end of year posts from last year had some pretty heady thoughts on that, specifically around software development and applying AI to it. I mean, the key thing was last year was the year that it became indisputable that these things were amazing at code. Yes. Right, like you could just about argue at the end of 2024 that the code they wrote wasn't very good and was full of bugs and maybe it could slow you down if you didn't know what you were doing with it. That's gone now. And that's because both Anthropic and OpenAI spent basically all of 2025 training for code. Yes. Like Claude Code came out in February of that year. It quickly became apparent, if you want people to spend $200 a month on your product, code is the thing that they will pay for. And then there was that moment in November where it was Claude Opus 4.5 and GPT-5.1 came out at the same time, basically. And they were that tipping point where the coding agents got good. Like up until then, coding agents were hit and miss. Sometimes they'd be useful, sometimes they weren't. I'd say that November was the point when a coding agent can now be a reliable partner. You can use it and know that you're going to get working code nine out of 10 times. Which is pretty great. You know, now you can use it as a daily driver. And so many of my peers are saying that, yeah, 70, 80% of the code that they produce is now written by coding agents for them. They'd have laughed at that idea a year ago. And now it's not even surprising to say that. Yeah, yeah. I think it's kind of amazing, both the paradigm. Because Claude Code launched just over a year ago now. You know, both the interactive paradigm shift from IDE-based interactions or autocomplete or whatever with these tools. And to your point also with code where it's like, okay, this certainly means I don't have to type out for loops or look up function definitions as much anymore to, you know, these new, not even the newest models anymore, but the models from late last year to, I can give it a higher level. You know, my kind of experience, one of our portfolio companies, EXE.dev, it's hosting Sandbox, the new kind of cloud for these workloads. But I was using their agent over Christmas break and just to actually, they just soft launched and I was just trying to provide some, you know, barely valuable user feedback. But it occurred to me as I was working on some site app that I'd been working on it for like two, three weeks, adding feature after feature after feature after feature. And I was like, wait, I've never actually looked at the code that this thing has put out. Like, it's still working, which was dramatically not, I And my take on vibe coding was, it's fantastic provided you understand when it can be used and when it can't. Like, personal tool for you, where if there's a bug, it hurts you, go ahead, right? If you're building software for other people, vibe coding is grossly irresponsible because it's other people's information. Other people get hurt by your stupid bugs. You need to have a higher level than that. And then the contrast is like agentic engineering, where you are a professional software engineer. You understand security and maintainability and operations things and performance and so forth. You're using these tools to the highest of your own ability. I'm finding the scope of challenges I can take on has gone up by a significant amount because I've got the support of these tools, but I'm still leaning on my 25 years of experience as a software engineer. And the goal is to build high-quality production systems that, if you're building lower-quality stuff faster, I think that's bad, right? I want to build higher-quality stuff faster. I want everything I'm building to be better in every way than it was before. The problem is that as the coding agents get more reliable, I'm not reviewing every line of code that they write anymore. You know, even for my production-level stuff, I know full well that if you ask Claude code to build a JSON API endpoint that runs a SQL query and outputs the results as JSON, it's just going to do it right. Like, it's not going to mess it up. You have ad-automated tests. You have an ad-documentation. You know it's going to be good, but I'm not reviewing that code. And I've got that sort of feeling of guilt where I'm like, if I haven't reviewed the code, is it really responsible for me to use this in production? The thing that really helps me was thinking back to when I've worked at larger organizations where I've been an engineering manager before and I've had my team building software, other teams are building software that my team depends on. And if another team hands over something and says, hey, this is the image resize service. Here's how to use it to resize your images. I'm not going to go and read every line of code that they wrote. I'm going to look at their documentation, and I'm going to use it to resize some images. And then I'm going to start shipping my own features. And if I start running into problems where the image resizer thing appears to have bugs or the performance isn't good, that's when I might dig into their Git repository to see what's going on. But for the most part, I treat that as a semi-black box that I don't look at until I need to. I'm starting to treat the agents in the same way. And it still feels uncomfortable because human beings are accountable for what they do. Like a team can build a reputation. I'm like, you know what? I trust that team over there. They built good software in the past. They're not going to build something rubbish because, you know, that affects their professional reputation. Claude Code does not have a professional reputation. It can't take accountability for what it's done. But it's just proving over time and time again, it's churning out straightforward things and doing them right in the style that I like. So yeah, that's a complicated thing, right? Very complicated. Yeah, actually, it's kind of funny because in my experience with models is they're very happy to take quote-unquote accountability for what you're like, wait, did you just do this thing? And it's like, oh, I'm really sorry. I did just do that thing. Right, that's helpful nothing. And then, of course, Claude 4.7 came out and I've built trust in Claude 4.6, but now there's a new model. How long does it take me before I start trusting it not to make weird mistakes that I wasn't expecting? Yeah, I do think that framing, I was actually just literally thinking that in a large scale human engineering org, there's some piece of load-bearing code. At some point, how many humans actually looked at that piece of code, right? Like one wrote it, maybe another one, like, you know, it said plus one on the PR, you know, haphazardly, and then it shipped. And then probably neither of those people even work here anymore, right? The thing I started realizing, like, this is the problem I have with side projects as well. Like the whole industry is facing this. It used to be if you found a GitHub repository with a hundred commits and a good readme and like automated tests and stuff, you could be pretty sure that the person writing there had put a lot of care and attention into that project. And now I can knock out a Git repository with a hundred commits and a beautiful readme and comprehensive tests of every line of code in half an hour. It looks identical to those projects that have had a great deal of care and attention. Maybe it is as good as them. I don't know. I can't tell from looking at it. Even for my own projects, I can't tell. So I realized, well, I value more than the quality of the tests and documentation. I want somebody to have used the thing. Like if you've got a vibe-coded thing which you've used every day for the past two weeks, that's much more valuable to me than something that you've just spat out and you've hardly even exercised. Yeah, I think one of the things you called out, and I think you're alluding to, there's prior to the kind of Opus Codex moment, I do think there was this pretty clear heuristic between vibe coding and agentic engineering where it's like, well, if it's important, then a human absolutely has to at least review it and approve it. And I have a similar, I think, epiphany to you in starting to use these things. I was like, oh, these are good enough now that it's not going to make sense to gate every single thing that comes out of these into production systems on human review, right? Absolutely, yeah. But I think you used the term load-bearing earlier, and that's super important. Like anything that I write that is security adjacent, I'm reviewing that. That is, it is not responsible to outsource that entirely to the agents. And knowing what's security adjacent and what isn't is a skill that you develop through years of software engineering. So, yeah, none of this stuff is easy. Yeah, one of the things I think is interesting, like the history of software development, at least one lens you could view it through, that's helpful for me is removing bottlenecks, right? So one of the most important parts of cloud computing, you know, besides like, oh, making CapEx, OPEX or whatever, was that human engineers, for the most part, didn't have to wait for someone else in some other building to rack and stack a server to actually deploy to production, right? It was just like, well, now there's an API call, the server exists, you can ship the software. And that removed this big bottleneck. And I think was part of a big, you know, explosion in developer productivity tools, you know, over the course of the last, I guess, pushing 20 years now, because now how fast can your human developer move became the bottleneck, not like, we need the network architects to open up some firewall ports and they're not going to get around to it for two weeks or something. It strikes me that up until late last year, even with all the excitement around autocomplete and tools like Cursor or whatever, human review was actually this like, between development and production, this like bottleneck gateway kind of rate limiter. And if we remove that, it's, I'm curious, it feels like everything downstream potentially breaks. 100%. Or it needs to be rethought. Right, like that, the single biggest question in all of this becomes, if you can go from producing 200 lines of code a day to 2000 lines of code a day, what else breaks? Like the entire software development lifecycle was, it turns out, designed around the idea that it takes a day to produce a few hundred lines of code. And now it doesn't. Yes. And what does that mean for, and it's not just the downstream stuff, it's the upstream stuff as well. If you're like product designers, I saw there was a great talk by Jenny Wen, who's the design leader at Anthropic, where she's been saying, we have all of these design processes that are based around the idea that you need to get the design right, because if you hand it off to the engineers and they spend three months building the wrong thing, that's catastrophic. So you need to do your, there's this whole very extensive design process that you put in place because that design is a very, results in expensive work. Yeah. But if it doesn't take three months to build, maybe the design process can be a whole lot more riskier because the risk of you getting something wrong has been, the cost if you get something wrong has been reduced so much. Yeah, so it's fascinating because it also applies actually, I hadn't thought of it through that light, but the whole discipline of product management, like good product management and like specification and thinking things through is the cost of an engineer building the wrong thing or building the not as valuable thing. It's been so high historically, right? One of the things vibe coding in actual like professional software development, it reminds me, extreme programming had this term called spikes. I don't know if you're familiar. Oh yeah, I'm very spiky and I love spikes. Yeah, yeah. So I guess for maybe younger members of the audience, when humans wrote all the code, a spike was, okay, we have a working theory about something. Maybe the best way to figure it out is we take two days, which is a very small amount of human development time to do a thing and say, remove all the guardrails. Like, don't worry about writing And tell it to write some Python and go back and forth with it while I'm out with the dog. I get a lot of coding done while I'm like cooking dinner because my laptop's over here and every five minutes I pop over and run another prompt and then I go back to other activities. I'm so much more interruptible than I used to be. And I spent years developing my own mechanisms for interruptibility for being interrupted, where I would like do all of my work in GitHub issue comment threads and so I'd constantly be writing an extra comment about what I'm doing next. So if I got interrupted, I could re-read my comments, kind of like an engineer's like lab notebook and get back up to speed. I don't do that at all anymore because the need to get really deep into it and hold all of that stuff in your head is so important when you're typing out a hundred lines of code. It's way less important when you're prompting the model with the next sort of architectural direction to go in. And yeah, it's, but it's fascinating. Like I would never have thought that I'd be able to productively work on two or three projects simultaneously, which I can now do, especially. And the the harder the project, the easier it is to parallelize because you can prompt something that takes 10 minutes for it to build the next segment of this thing, which gives you 10 minutes that you can go and spend on other things. It's weird, really weird way of working. One interesting area, and you already alluded to this, but one of the things I've been trying to wrap my head around, how much prior and, you know, I've been now, yeah, I've been investing now for almost eight years, but prior to that, you know, I spent the better part of, you know, 20, 25 years as a technologist. I dropped out of PhD in pure science. I wrote a lot of code in my career and I find that really, really helps when using these tools in terms of the kind of conversations I can have with them and the kind of directions I can put them in. How important do you think for yourself or for future use, actual grounding in the underlying techniques is for being successful with these? It's such a difficult question, isn't it? Because I've got 25 years of experience. So all of that stuff is just there. And when I look at my conversations with the agents, it's very clear to me that this is moon language for the vast majority of human beings. Like there are a whole bunch of reasons I'm not scared that my career as a software engineer is over now that computers can write their own code, partly because these things are amplifiers of existing experience. Like if you know what you're doing, you can run so much faster with them. But yeah, at the same time, what that means for newcomers to programming, I'm probably the last person on earth who can credibly answer that because I'm 25 years out of being that newcomer. But I'm optimistic about it. I'm seeing indications that suggest that new programmers, this is incredibly helpful there as well. And I think back to I've coached people, learned to program a bunch of times and there's that first six months of absolute hell where first you have to get a development environment set up, which is a nightmare because it breaks all the time. If you forget a semicolon, you get a weird error message and you might spend half an hour battling with that error message to try and figure out where that semicolon is. Ideally, you can have an experienced programmer around who can look over your shoulder and say, here's the missing semicolon. But most people don't have that. That's solved, right? If you've got a missing semicolon, you copy and paste the error message into ChatGPT and it tells you the answer every single time. So that initial friction, that miserable six-month learning curve where you're fighting the thing all the time, that's been smoothed down a lot. And I love that because I know so many people who always wanted to learn to program but never did. And they assumed that they weren't smart enough to learn to program. And it wasn't that at all, it was that nobody warned them that there's six months of miserable tedium before you get to that sort of ray of light where you can actually get something to work and it doesn't take you five hours of of struggling against semicolons. And most people got frustrated and quit, right? And now those people are all learning to program. Like I've I've talked to a whole bunch of people who are enthusiastically vibe coding as a starting point, but through vibe coding they're getting that reinforcement that what they're doing is working. They're learning more about code. They're starting on that journey. The open question is, in three years' time, will these new vibe coders be like software engineers who you would trust to build a production system? Or will they still be just just vibe coding things that that that don't have that sort of long-term value? And I don't know the answer to that, but I think we'll we'll find out within three years. Yeah, yeah, I'm I'm really fascinated. I think like you, I continue to think, even with kind of the advancements in the last six months, you know, that there's a place and a role for humans. You know, if I if I zoom way out, like a, you know, I view the role of a software engineer has always been to sit between like very messy, fuzzy, and continuously changing kind of business requirements and, you know, deterministic things happening on silicon, right? And that will never be easy. I'm constantly reminded as I work with these tools that how hard the thing that we do is, like producing software is a ferociously difficult thing to do. And you could give me all of the AI tools in the world and it's still really difficult what we're trying to achieve here. Yeah, so I think that we're all still there. And I think the interesting challenge, like you hit on is, and I think a small part of that role, certainly in areas like you said, security adjacent, is going to require the ability to like drop in and understand code. And even if it's like little small pieces, you know, I've been talking to you in my network, a lot of like very kind of senior tenured software engineers like yourself. And you know, some of them have existential questions. Like, I've been doing this my whole career. What does this mean? And I'm like, well, I think you all are kind of like preatomic steel in that I think we're going to have to find ways to train and mentor, you know, new people in the profession. But I think those probably will all ultimately fall short of like, oh, I had to do this for 25 years before AI. That would be pretty awful. Wouldn't it be awful if in five years' time, the only useful programmers are the ones who've had 20 years of pre-AI experience, right? That would, that is not a good situation for us to be in. Yeah, I don't, I don't I don't think we will be. Scott Hanson was on the show recently from Microsoft. And he and some of his colleagues wrote a really interesting article in the communications of ACM about potential new models of mentorship and how we could invest in upskilling junior engineers. One of the things they mentioned, which I thought was really interesting, is tools like CloudCode or similar tools should have a mode where you kind of flip the AI into the, to go back to like pair programming, the notion of the driver versus the Oh nice. Yeah. Right? Or you could be in working with cloud and saying, hey, we need to add this capability to this app and cloud could be like, cool, yep, I know how to do that. We're going to do XYZ. Or even if you're bug fixing, right? Like, but you're the fingers on keyboard. I'm fascinated. What would it be like to do pair programming with an agent where you're the one typing the code? Exactly. Yeah. That's a fascinating idea. I'm tempted to, to try that out and see how that works. Yeah, I feel like there could be some very interesting training modes where you would still get the benefit of the, like you said, I don't have to go three cubes over to bug some senior engineer to look for where the missing semicolon is, but I still have to physically go through the active. Oh, that's really nice because one of the best things about pair programming has always been that you've got someone else to look things up. If you're like, oh, what's the regular expression syntax for X? They can be doing that while you're typing the code and the models will do that stuff flawlessly. That's a super interesting thing. I'm a huge fan of Scott's. I hadn't realized, I just found that ACM article. I'm going to have to read that one. That looks really interesting. Yeah, it's really good. We had a really interesting chat. We'll link to the episode in the show notes. I'll tell you one thing. So Matthew Iglesias, who's a political commentator, yesterday he tweeted five months in, I think I've decided I don't want to vibe code. I want professionally managed software companies to use AI coding systems to make more, better, cheaper software products that they sell to me for money. And that feels about right to me. Like, I could, I can plumb in my house if I watch enough YouTube videos on plumbing, I would rather hire a plumber. That actually, yeah, let's talk about that because this, this is um an area I've uh I've been in a lot of spirited discussions because they're, yeah, there are a lot of people, very intelligent people, by the way, and making good points, but who, you know, the kind of dots they've connected is like, oh well, because Claude can build work in software, like, literally the software is dead. I mean, I don't know, like Build custom software for your local butcher shop, it's probably not got a big enough market that you can pay for the development time to build it in the past. Yeah, because in human, I think just even broad strokes, keeping the numbers simple, a kind of minimal thing is like two weeks of developer time, two, three weeks of developer time. You're talking tens of thousands of dollars just conservatively. And then maintaining on an ongoing basis, hundreds, low to mid hundreds of thousands of dollars a year, which, yeah, that is not the kind of margins the butcher shop has historically speaking, at least. Right. And so when I hear people saying like, oh yeah, like software is gonna be dead because we're all gonna do our own things. I think part of what you're actually hearing is it's been really frustrating that the only economically viable way for this platform to exist is for it to be sufficiently horizontal and cross a sufficient number of categories that the TAM, the addressable market is large enough to support it. Yeah. But I also believe what enterprises certainly get when they get software is well thought through and cohesive and working data layers, security. I just realized it's the thing that I said earlier about how I only want to use your side project if you've used it for a few weeks. The enterprise version of that is I don't want a CRM unless at least two other giant enterprises have successfully used that CRM for six months. Yes. Like, it's exactly the same mentality, right? You want solutions that are proven to work before you take a risk on them. Yeah, well, there's an interesting, even, yeah, that post by Iglesias was great. There's an even issue I can zoom in now because you referenced, I just read this last night, that in The Verge is an article by, I think his name, Nilay Patel. He's amazing. I'm a huge, huge fan of that guy, yeah. He's the editor-in-chief, I think. Yeah, the title was just, struck me was, the people do not yearn for automation. Poetry, poetry, yeah. But that is the best piece of commentary I've seen about the growing AI backlash. Like, if you poll the general public on AI, those numbers are cratering. I think Nilay was saying that they're less popular than ICE now. It's incredibly negative. Yeah. Like, I think a lot of, if you don't pay attention, but yeah, I mean, it's like touching or dipping below like 20% approval in some, I mean, which is wild. Especially amongst Gen Z, who are also the highest, the people who use it the most, hate it the most, which is so interesting. And yeah, Nilay's whole pitch on this is the things that we get excited about as people with software brains, like the idea that we can automate all this stuff, that does not work for most people. Like, the reason home automation has not taken off beyond the nerds is that most people just don't want to be able to clap and shut their blinds or whatever. That's not exciting to mainstream humanity. Yeah, yeah, I've been working in spots just here and there, but kind of iterating on somewhat detailed cloud skill to basically do some, like, robotic process automation on a couple of different CRM tools. And it struck me at some point in the last few weeks as I was working on this. It's very similar to like building a fairly detailed shell script back before where you're like, oh, I have to, and yeah, I've been curious. And part of the reason the article resonated, because it, again, it struck me. I was like, okay, well, I've been doing automation through code for decades and I'm just used to thinking in this mode of like, what are all the edge cases? Like, what do I have to account for? Like, what do I, and kind of keep building. I was like, I don't think quote-unquote normal people think like this or not because they're not capable. They just, that's just not a thing they've had to do. And, and. It's a habit that you build up over years, right? Spotting those little frictions that you can fix and then fixing them. And once you figured out how to do that, you can't understand why you would tolerate a friction in your life ever again if you could automate it away. But yeah, most people just, they don't have time in their life for that mentality because they've got all sorts of other stuff going on. Yeah. And so it's interesting as I was thinking through that, that most people's experience, I think, across broad human population with AI or it is basically search. It's a better search, right? Where it's like, hey, I have a question I need answered. I'm going to ask you. You're going to both go through what's already in the weights. Then you'll probably do some now with thinking models. You're going to do some search and you're going to like synthesize it all and you're going to come back and you're going to present it to me in precisely the lens that I asked the question, right? But that is a wildly different mode of operation than, yeah, let's sit down and automate some work. Right. Or let's have it write me an entire piece of software from scratch in 10 minutes. Like, yeah. And meanwhile, all of the rhetoric around this stuff is horrifying, right? It's all, oh, it's so dangerous. It might destroy the world and all of our software is going to be hacked and everyone's going to have a job ever again. Like, yeah, this is, there is not surprising that AI has an image problem. Yeah, which is, I think, outside of the scope here, but yeah, very much an image problem and something I think maybe they're starting to take seriously, but yeah, the people at the large players, I think, should take very seriously. Very seriously. Like the AI data center backlash is about people not liking AI, right? I think a great deal of that is people like, what is something I can campaign against in my life that pushes back against this? It's the construction of these giant buildings that suck up all of the energy and so forth. And, but that feels to me like that, that's the expression of that wider, wider distaste at the moment. Yeah, the data center is like, for a lot of people, the physical manifestation of AI. Exactly, exactly. You mentioned this early on, because one of the things I was interested in getting your take on, I do think you're right. Part of the reason we had this watershed moment is because, you know, 12 to 16 or 18 months prior, people at the foundation labs decided that like, oh, coding is going to be really important. And so what do we need to do? And so you had at least three things. I mean, there's certainly the interface, the cloud code, the agentic interface, I think is critical. I think the reasoning tool use, RL and the impact there, I think is really fascinating. That's what OpenAI and Anthropic spent 2025 doing. All of their compute budget went into reinforcement learning against simulated software. Like you fire up 10,000 virtual machines with Python interpreters and you generate code and you see if the code works and you vote, you've thumbs up if the code works and you thumbs down if the code fails. And some of the Chinese labs, I think it was Quen, in one of the Quen papers, they talk about firing up 10,000 virtual machines to do this, right? This is acknowledged. And I think this is why XAI and Gemini are behind, is that they didn't spend all of 2025 running reinforcement learning loops on code, which with hindsight is what they should have been doing. And so you now hear that, but like both of those companies are doubling down on the coding side of things, but they're 12 months behind on what they needed to get done, I think. Yeah, and generally software has unusually, because this is reinforcement learning as a technique, you could in theory apply across a lot of domains, but software coding has like unusually clean reward signals. It's the perfect fit for it. Yeah, like, did the code work? Yes or no? Did it, all of that. Like you think about lawyers and the way you do reinforcement learning on law is that you have to put it through a trial and see if the judge and the jury agree or disagree. And so that's like a six month turnaround to find out if your AI generated text was the right thing or not. That is clearly not nearly as easy as, as a Python script. Yeah, I find it fascinating. I think at the beginning, you know, kind of immediately post the ChatGPT moment, you know, there's a lot of talk, especially for maximalists about lawyers going away or these other professions going away, which to be clear, I don't think any professions are going away. Well, translators might be in a rougher spot. Data entry is a very scary area for that. Yeah. Yeah. But in some of the higher levels, but yeah, I, I find it kind of amusing that in, it may actually be, you know, software engineering may actually be the field that is the most susceptible to, let's say change at least or disruption based on these things. You mentioned Kimu. You know, there's some other models. There's Kimi, which Cursor is kind of famously using there. Composer built on, like you said, Poolside. I haven't had a chance to play with it yet, but they just, just yesterday as of this recording, released their open weight coding models. I've been following the Chinese AI labs very closely for the past year and a half because wow, wow, they put out some good stuff. And I think there's at least five competitive Chinese labs that are all putting out models that are like three to six months behind the frontier closed This year, Gemini's leading models weren't as good at coding as OpenAI and Anthropic's leading models. So then it comes down to things like data provenance. Do you want a local model because you're avoiding sending things through APIs to these cloud models? For most cases, I think that's a false optimization. I think there's a huge amount of paranoia around that. But if you've got a well-signed agreement with Anthropic that they don't train on your data and so forth, you can trust them to hold to that agreement. At the same time, I do a lot of work with journalists. And journalists sometimes have situations where they have to protect their sources and there can be a government subpoena to a data center going after a journalist's sources. So journalists actually have a very credible reason to want to use local models for some of the stuff that they're working with. My own interest in local models sort of has been waxing and waning over time. About a year ago, I lost interest entirely because they were just not nearly good enough that it was worth me using them for anything. Like the models on my laptop were so puny in comparison to what I could get out of the big cloud providers that I just couldn't see myself using them for anything other than playing around. That's changed in the past six months. Now the models I run on my laptop, I can get real work done with, which is notable, except that it turns out real work now is throwing Claude Opus 4.7 at an incredibly difficult technical problem and letting it churn away for 10 minutes. And the local models can't do yet. So I go back and forth on it just as an individual who's a sort of enthusiast in this stuff. But yeah, I do feel like there's a lot to be said, especially as you spread out more from just the normal stuff that we're doing with these models. If you're doing like data extraction from video, Gemini is pretty much the only player in town. You can feed Gemini an hour long video and it can answer questions about it and pull out structured data from it. The audio models are getting stronger. A lot of what I do on the journalism side is try and extract content from PDF documents, which was almost impossible two years ago. And today, given the right combination of models and the right setup, I can get really, really good results out of. But yeah, it's complicated, right? There's some... I do think locking yourself into a single model is very short-sighted at this point. And I would say that I build an abstraction layer in Python. I've got an open source library for talking to different models. So I'm already quite invested in the idea of these abstraction layers. Yeah, I was actually going to follow up and say, what techniques are you using to, for yourself, to enable you to use different models? Like, what's your harness look like? Well, I have my own tool called LLM. It's a command-line tool and a Python package for talking to all sorts of different models. It's got a plugin system for talking to DeepSeek and Mistral and all of those kinds of things. And last year, I used it very heavily. This year, I've not been using it nearly as much because Claude Code and OpenAI Codex got so good. Like, the majority of things I wanted to do with a model from my laptop, I can pile through Claude Code and Codex. But as I start wanting to do more with the local models, you can plug Codex and Claude Code into a local model, and it doesn't work very well because they've got like 20,000 token system prompts that the local models aren't particularly good at handling. I've been playing a little bit with Py, which is a much more lightweight open-source coding agent. I've got a massive upgrade to my one coming out actually later today, which we designed it to be a better fit for working with the reasoning models and things like that. There's also, there's the whole open-claw thing, which I'm fascinated by. I don't trust it, but I feel like the new hello world of working with models is building your own little open-claw. So I'm working on my own. We call them claws now. So I've started work on my own claw on top of my own LLM framework. So that's a whole other side project that I'm getting into. But yeah, so for my day-to-day driver, it still, it was Claude Code until a week ago. It's now OpenAI Codex. For the most part, their latest version is outstanding. And I don't trust Anthropic's Claude Code pricing. They've been messing around with things around that and I'm not happy with. Well, yeah, maybe that's a good segue because something else I want to chat about, which is fascinating to me because this is just something that's kind of come up in the, yeah, I mean, like you said, that the Claude Code pricing and whether it was an A-B test or whatever about whether or not pro users would have access to Claude Code anymore. I think GitHub Copilot just put out. They just, GitHub Copilot and WinSurf as well, actually had per-request pricing where the cost for a prompt that might trigger a whole bunch of tool calls was the same no matter what the prompt was. And that made sense a year ago. And today it's a terrible idea because maybe one prompt will run for 10 minutes. So both WinSurf and GitHub Copilot have switched to the same per-token pricing that everyone else is using now. Yeah. And I think it's interesting because there's always been some kind of chatter on whether the last few years have been the foundation labs and AI in general, then the early Uber era when you could, you know, catch a black car across San Francisco for like $3 or something. Yeah. It's funny. I feel there were two tensions pulling in different directions. The first tension is because of coding agents, heavy users used a hundred times the tokens they used to. Like if you're a heavy user of Claude Code, you are burning vastly more compute than if you were a heavy user of ChatGPT like last year. And this is driving a lot of the price increases. But the flip side is the open weight models, especially the ones coming out of China, have a sort of push pricing in the opposite direction. DeepSeek is what, 20 times cheaper than Claude Opus and benchmarks. It's not 20 times worse than Opus. DeepSeek, you can run it through their API, but you can also run it on your own hardware. So hopefully that, those sort of open weight models, the force that they have on the pricing helps counteract the obvious need for these companies who want to do IPOs to start actually making real revenue. But yeah, it's all very frothy on the pricing front. Yeah. Well, I think, I don't think the timing was accidental in that, you know, end of last year, these new models that you could start doing so much more with in terms of software engineering come out. And then it's really, I mean, this is like April 2026. It's this month after like all these finance teams did their Q1 review and looked at the token number and went, wait, what? Like, what's going on here? I mean, we just had two massive price hikes this week. So Opus 4.7 is priced the same as Opus 4.6, but the tokenizer is less, it takes 1.4x the tokens for different things. So it's effectively a sort of invisible 40% price bump. And then topic will tell you, it uses less reasoning tokens than the previous ones, but it's definitely more expensive. GPT-5.5 is double the price of GPT-5.4 over the API. Like these are the most significant price hikes we've had since I started tracking the pricing of these things. Yeah. And so I think it will be interesting as this year plays out. For me, it feels like this is finally the moment where I think forward-thinking enterprises, obviously this is the future. These tools are going to be a critical part. They're going to be used a lot, but are going to have to start measuring outcomes against input costs. And I think we've been in this honeymoon period where more than any other new tech, at least in my career, like the budget has just been wide open. Like usually these new things, you have to like fight and claw for a little bit of budget to try out some new technology. And I feel like the last few years. The one thing that's certain is that we're going to have to stop having scoreboards of how many tokens people have used and like maybe don't do that if you don't want to blow your entire budget on wasteful token usage. The first time I saw mentions in some media of a business employing a token maxing leaderboard, I just immediately was like, I went, oh no. There's no world in which that ends well. Yeah. It's like this is an executive who has never attempted to put a metric on a human software engineer before, like a gameable metric. And they're about to learn a very, very expensive lesson. Well, this has been so much fun chatting. I could go on forever. You know, maybe we'll have to chuck back in sometime in the future. Where's the best place for listeners? You know, if we have the odd one or two listeners who doesn't already follow you, where can they find more of your work? Basically, I'm all on simonwilson.net. So that's my blog, which links to all of my other stuff. I'm present on BlueSky, Mastodon, and Twitter. I've also, I have a substack, which is simonw.substack.com, which is just my blog copied and pasted into a newsletter about once a week. But a lot of people subscribe to that and really appreciate it. I guess RSS feeds are not as widely spread as they were when I was blogging back in