← Return to Index Archived March 25, 2026
The Lead — Mar 25
HOW I AI · CLAIRE VO

How Stripe built “minions”—AI coding agents that ship 1,300 PRs weekly from Slack reactions | Steve Kaliski (Stripe engineer)

41m / March 25, 2026 /aitechnologyproduct / Transcript sourced from openai
All episodes from How I AI →·Podcast website →·Listen on Apple Podcasts →

Overview

This episode of How I AI features Stripe engineer Steve Kolesky explaining how Stripe uses internal AI “minions” to turn prompts from places like Slack, Google Docs, or JIRA into working code changes. The conversation focuses on how agentic engineering lowers the “activation energy” of starting work, how cloud-based development environments make parallel AI workflows practical, and why strong CI/review systems matter even more as AI-generated code scales.

The episode also explores a forward-looking idea: agents as economic actors. Through a demo of an agent planning a birthday party and spending money programmatically, the discussion broadens from coding assistance to a future where agents can transact directly with services.

Key Takeaways

One of the strongest ideas in the episode is that AI’s biggest impact may not be writing code faster, but reducing organizational friction. At Stripe, engineers can trigger a minion directly from Slack with an emoji, and the agent provisions a development environment, makes changes, runs tests, and opens a PR. That means good ideas no longer have to wait until someone sits down in an IDE and manually kicks off implementation.

A notable metric underscores the scale: Stripe is landing roughly 1,300 PRs per week with no human assistance beyond review. That does not mean humans are removed from the process; rather, their time shifts from boilerplate implementation toward judgment, review, and product thinking. The bottleneck moves from authoring to validation and prioritization.

Another important insight is that agentic coding depends heavily on infrastructure, not just models. Kolesky emphasizes that local laptops quickly become a constraint when running multiple worktrees and agent loops. Hosted cloud development environments are what make true parallelized engineering possible. This is a key message for engineering leaders: if you want AI to materially increase throughput, invest in developer experience and virtualized environments.

The discussion also makes a subtle but important point about software safety. AI-authored code still requires the same high-confidence CI pipelines, test coverage, synthetic testing, and deployment safeguards as human-written code. In other words, the standards for safe software delivery do not change just because the author changes.

Finally, the birthday-party demo introduces a broader concept: token usage and direct payments are converging into a shared economic framework. As agents increasingly purchase access to tools and services on demand, businesses may emerge that are designed primarily for agent customers rather than human users.

Practical Steps

If you want to apply the lessons from this episode, start by lowering the friction between idea and execution. Let teams trigger coding agents from the tools they already use, such as Slack, tickets, or docs, rather than requiring a full handoff into engineering workflows before anything starts.

Invest in cloud-based development environments that can be spun up quickly with the right code, services, and configuration already available. This is especially important if your engineers want multiple AI agents running in parallel without overloading local machines.

Strengthen your CI and release processes before scaling AI coding. Specifically:

  • Improve automated test coverage.
  • Add end-to-end synthetic checks where possible.
  • Use deployment patterns such as blue-green rollouts and rollback mechanisms.
  • Treat AI-generated code as production code that needs the same safety rails as any other change.

Encourage non-engineers to experiment with agentic workflows, especially for documentation, prototypes, and lightweight product changes. If people can describe what they want in natural language, they may be able to initiate useful work without needing to code directly.

Finally, pay attention to repeatable prompting patterns. Kolesky suggests saving successful instructions, “skills,” or prompt templates so that recurring workflows can be reused rather than rediscovered each time.

Notable Quotes

“At Stripe, we’re landing about 1,300 PRs that have no human assistance besides review per week.” — Steve Kolesky

“The activation energy of starting work feels a lot lower.” — Steve Kolesky

“Whether the text has been written by Steve or the text has been written by Steve’s robot, you still want that CI environment that’s providing confidence that the code that’s being changed is safe.” — Steve Kolesky

Full Transcript

Source: openai 41m runtime

At Stripe, we're landing about 1,300 PRs that have no human assistance besides review per week. A lot of where our work begins is, it could be in a Google Doc as we're planning a new feature, or maybe a JIRA ticket comes in, or we're talking about something in Slack. I can click an emoji, and then the minion will sort of attempt to one-shot resolving that prompt using all the tools that are available at Stripe. When you're in larger organizations, there's so much friction that can come between a good idea and getting it into the world. Not only can I have one of these, but I could have many, many of these running in parallel in isolated environments, making isolated changes all at the same time. How are you getting all this code review done? Whether the text has been written by Steve or the text has been written by Steve's robot, you still want that CI environment that's providing confidence that the code that's being changed is safe and that as it rolls out, you're having blue-green deployments so you can roll back too. All of that is super critical independent of the nature of the authoring of it. No matter how juiced these laptops are, you get three or four work trees in, and it starts to sound like an airplane taking off. It's no good. And so I do think on this multi-threading agentic engineering work, cloud environments and virtual environments are so important to unlock velocity. Welcome back to How I AI. I'm Claire Vo, product leader and AI obsessive, here on a mission to help you build better with these new tools. Today we have Steve Kolesky, a software engineer at Stripe, and he's going to show us how the Stripe team deploys a bunch of minions to do their engineering work. We'll also watch an agent spend a little bit over $5 to plan a birthday party all in Claude code. Let's get to it. This episode is brought to you by Optimizely. Most marketing teams aren't short on ideas, but what they are short on is time. And that's exactly what Optimizely Opal gives you back with AI agents that handle real marketing workflows. You know, like creating content and checking compliance, generating experiment variations, personalizing user experiences, analyzing pages for GEO, even tasks like approvals and reporting. It's your AI agent orchestration platform for marketing and digital teams, plugging seamlessly into the tools you already use, handling the boring busywork and keeping everything on brand. That leaves marketers with more time to do your actual job. See what Opal can automate for your team by signing up for a free enterprise agentic AI workshop with Optimizely. Find out more at optimizely.com slash howiai. Attend live and you'll get a free pair of Ray-Ban Meta AI glasses. Steve, I'm so excited to have you on How I AI because I saw the Stripe minions on the timeline. And one, exceptional branding, don't sue us. And two, I just love the idea that you and your colleagues in the team at Stripe have created, not just one agent, but minions all across the company that can help with development work. And I'm so excited for you to show us how that helps you in your day-to-day here. So welcome to How I AI. Thank you for having me. So tell me, what has been the effect that minions have had on you personally at Stripe and at the Stripe team as a whole? Sure. So, you know, for me personally, I think sort of anecdotally, I don't remember the last time I started work in the tech center. Right. So I do end up there often. But, you know, what I found is that, you know, a lot of where our work begins is, you know, it could be in a Google Doc as we're planning a new feature. Or maybe a JIRA ticket comes in or we're talking about something in Slack. And those are sort of like the more natural entry points to starting work. Right. And then you end up in a tech center when it's time to, you know, actually do the work or make the final tweak. And it's just felt very natural. And I think in particular, the sort of like activation energy of starting work feels a lot lower. Right. So if, you know, you're in a Slack thread and maybe there's a piece of user feedback and it's something simple like a, you know, we have to update the docs or or maybe it's something more consequential and we just want to build a prototype. I can click an emoji and like the work begins. And often the work finishes, too. You know, we at Stripe, we're landing about 1300 PRs that have no human assistance besides review per week. But at the minimum, the activation energy of like starting to write code, seeing tests pass, maybe a test fails, occurs without me even, you know, participating. And then I can jump in and I can tweak and I can kind of like have them that momentum sort of it's sort of like generative momentum, you know, that I can hop in halfway through. What I think is magical about this and I won't call Stripe a big company, but you do have a decent amount of employees and very, very large business is I love that concept of activation energy going lower because when you're in larger organizations, there's so much friction that can come between a good idea and getting it into the world. And it's not malintent, right? It's nobody's like, oh, man, I really want to slow this process down. Yeah, it's either, you know, functional. I don't have access to a technical area of expertise to actually get from here to there. It's operational. I don't know how to organize people and communicate effectively to get the next step done. Or it's just kind of like people get siloed in their day to day and don't think of new ways to get work done. And one of the things that has been so revelatory about AI for me personally is like all that just kind of goes to zero because coordination costs can go down, execution costs can go down, communication costs can go down. You just get closer to the work, which I think is the fun part we all really care about. So show me how you actually activate a minion. And, you know, we skipped this a little bit, what a minion is. The quick spiel of a minion. When I, as an engineer, sort of in pre-AI time, you know, want to make a modification to Stripe. Well, Stripe is a huge code base with tons of services. It can't run on my computer alone. So Stripe already has a long history of investing in great developer tooling, having hosted development environments that I can spin up that, you know, have all the code already there and services running. And I can SSH in and make modifications. And we have a ton of great CI tooling around that. So that's the context. You know, we have all that. The idea with the minion is that I can provision one of those environments seeded with a prompt. And then the minion will sort of, you know, attempt to one-shot resolving that prompt using all the tools that are available at Stripe. Right. So all of our internal documentation, our internal CI, our, you know, test data, so on and so forth. And it will loop through that in an attempt to, you know, solve that prompt. So let's go ahead and jump in and see what sort of a prototypical experience might look like. So I'm in a Slack channel. It's called Steve Klisky Robots Dash Claire. I actually have a Steve Klisky Robots channel that has 76 humans in it. But I do have every, it sort of is just me and my robots. And now there's some sort of, you know, like audience observing. But let's imagine that, you know, maybe I'm thinking of a new feature idea or I want to improve documentation that we have. So we have a launch coming up soon. And I want to sort of embellish the documentation. So I'll say, I have this cool idea for docs at Stripe.com slash payments slash machine. This is our new machine to machine payment worker, which we'll look at later in our call. And I want to, you know, make sure the landing page really sticks and gives a good code example of how to get started quickly. Right. So maybe someone posted a message like that or it came in through a ticket or whatever the origin may be. All I have to do now is, you know, add a reaction, which is create minion pay server. This is a particular repository within Stripe. We get the one sec cooking from the dev box agent. And then we get a reply in here saying your minion for pay server. It's the repository for a new branch. It's created landing page code example has been created and it's going to kick off our doc service. So I can eventually preview it. Now I'm going to click follow along. So right now what it's doing is it's provisioning that development environment I was talking about earlier. Right. So this is this part isn't new. It is excellent, but it's not new. And basically it's going to spin up a instance in the cloud. It's going to apply all the configuration that's required for both me and the agent to do coding within Stripe. So this will just take a few seconds. It's going to check out that repository with a new branch, configure the local database, apply my get config. It's going to set up a VS code server so I could connect to it just through the web or locally. So with some extensions. So what's really great about minions is, you know, obviously there's the agent loop that's, you know, making the code modifications, but it's built on top of like a ton of incredible work that our developer productivity has done around just making it easy to get like a perfectly operating Stripe development environment for coding, which means that, you know, not only can Product itself, has that team been sort of built as a standalone team that's focused specifically on internal developer experience? Is that how it works? Yeah, we've had a developer productivity team for as long as I can remember. I think about, I mean, for six and a half years now. And you know, that team's focused on all the tools that I engage with and making them more useful, right? So that's all the way from, you know, how we interact with Git and version management to our tech centers and our configurations there, to our development environment, and how that whole story pieces together. And you know, we, just as a product engineer on Stripe, I care deeply about our external users and them being successful at Stripe. That team cares equivalently about engineers at Stripe being successful and being able to build things quickly. And I think that's been even more accelerated by AI in the last couple of years. And then one other observation I want to make, because I think you glossed over it a little bit at the beginning, but it is so important for folks that really want to go ham on coding with AI, which is, look, all of us engineers have a MacBook Pro that weighs 8 million pounds. That can do some damage. Mine, for anybody who wants to know, its nickname is Big Boy. So whenever I need my kids to get my coding laptop, I say, can you bring me Big Boy? Because I call it San Francisco rucking when I carry two of them in my backpack. Oh my God. But, you know, no matter how juiced these laptops are, you get like three or four work trees in, all running, and like it starts to sound like an airplane taking off. It's no good. And so I do think on this sort of like multi-threading agentic engineering work, cloud environments and virtual environments are so important to unlock velocity. And that's one place where I haven't seen enough large engineering teams invest in those environments to really unleash the power of either AI-assisted coding for their software engineers or agents in general. So if there are any CTOs, VPs of engineering listening, if you were to invest in something to really unlock growth in the next year, getting that situation locked up would be really good. Because, again, I hear so many people being like, oh, I can Claude code everything. I can, you know, I can Codex anything. I can spin up all these work trees. I'm fine. And I'm like, are you running all these local? Like, what are you doing? And so that's one thing I just want people to not miss, is the limitations of your actual machine on how multi-threaded you could be, especially in a complex codebase like Stripe's. Totally. You know, I have Slack on my phone, right? So I can even kick off one of these minions on the way to work as I'm sort of going through Slack on the subway. And then, you know, by the time I'm there, I can jump in halfway through. I think that maybe like the hyperbolic thing here is like, imagine if all engineers at a company could only work on, didn't have Git. We all had to coordinate working on the one codebase together. That would be crazy. And, you know, the equivalency here is like, imagine if I'm bounded by, you know, my agents are bounded by just what's available and can work on my computer. The 10x thing to do is, you know, be able to have 10 of them run in parallel, but also not be contingent on my, like, it's like everyone's buying a Mac Mini, right? So it doesn't fall asleep, right? It's like, there's a whole business around just the computer not falling asleep. I legitimately, first of all, I have like four Mac Minis upstairs. And one of them is just basically a laptop that doesn't close. Like I use it as a laptop that does not shut. And it's really unlocked my velocity. So, okay, we thank you for going on this side quest about virtual environments and localhost and all those things. I'm a founder, so I know most people don't start companies because they love running payroll or managing compliance. But somewhere between hiring your first employee and raising your next round, you end up in the weeds with HR, IT, and all that other stuff. That's what Rippling was built to solve. Rippling is a unified platform that lets startups run HR, payroll, IT, and finance in one system from day one. The Rippling startup stack replaces disconnected tools that don't sync with a fully connected platform. Over 15,000 startups, including Cursor, Clay, and Sierra, trust Rippling to scale fast without adding additional ops and HR headcount. So founders like you can keep building. Right now, venture-backed startups can get six months of Rippling startup stack for free. Head to rippling.com slash howi.ai and sign up today. That's R-I-P-P-L-I-N-G dot com slash howi.ai to sign up for six months free today. Focus on what you're building and leave the rest to Rippling. Okay, so you are now running this. You're going to, it's, you said one shot at the beginning. Really, you're trying to take one prompt and not a single reply gets you what you want, but it goes into the harness. It goes through its own loop, hits the tools it needs. And ultimately you as the end user get one response back, which is here's the successful implementation. Exactly right. So we can already see that it's identified the relevant files. It's keeping track of its own to-dos. That's something that we've codified in it to focus on. It's making changes. It's, you know, preparing the commit and so on and so forth. And ultimately sort of like taking out of the oven. We'll see a response at the end of just like, it finished. You know, like you can go ahead and look at the pull request and the sort of normal human review part continues. Let's talk about that really quickly. You said 1300 code or agent initiated PRs per week, something like that. And then humans are involved in code review. How are you getting all this code review done? Well, you could make the argument that, you know, if I'm spending less time actively writing code, I can, you know, re-center my time on reviewing the code that's being written or working with users and so on and so forth. So I think that's a big part of it. I think the other side of it, like, it comes back to that CI environment, right? So having really good test coverage, having synthetics that run to simulate end-to-end interactions with your product. Those all help inspire confidence in the code you're reviewing, right? So absence those, like, it'd be really difficult to look at code, especially in a huge code base, and have high confidence that it works. So, you know, again, whether the text has been written by Steve or the text has been written by Steve's robot, you still want that CI environment that's, you know, providing confidence that the code that's being changed is safe and that as it rolls out, you know, you're having sort of blue-green deployments so you can roll back too. Like, all that is super critical independent of the nature of the authoring of it. I do believe, like, if coding becomes easier and coding historically has been the bottleneck in product development, it's just gonna shift to other areas, right? So if, like, coding in effect becomes free, the review's gonna be really challenging, right? Or getting enough ideas in the first place could be a big problem or distributing them, right? So I think the attention is just gonna move around to other areas. Great. And then one other question before we go on to your next workflow, which I am so excited about. Spoiler alert. Is, are more than engineers using minions? Are you seeing product managers, designers come in? How is this going across the company and across functions? Yeah, I think, you know, part of why I like the Slack example is the entire company's in Slack, right? And, you know, to the point of activation energy, you know, even if, like, you had the text editor on your computer and I gave you the docs and whatever it may be, you know, to someone who's not an engineer, it could be really challenging or intimidating or whatever it may be. And, you know, for whether you just want, like, a proof of concept or you're going to make a docs change or whatever it may be, like, you can, you can probably write out in plain text the thing you want to occur, right? You might be writing the product brief or you might be giving design feedback. Like you're, you're in effect writing a prompt at some point. So being able to just click an emoji or, you know, tag the robot to spin up a minion, we're trying to see more non-engineer usage there. Yeah. Amazing. Okay, so let's go to our next workflow, which I am psyched. As somebody with a stack of Mac Minis downstairs, I am excited about. So, you know, at Stripe, we're, you know, we're thinking about AI in a few ways, right? So the demo we just showed us how we're thinking about using AI internally to accelerate our product development and engineering. The second way is, you know, thinking about how we're supporting all these businesses that are, you know, leveraging AI in their own products and how we can support their business models. And, you know, that's what things like usage-based billing. And we just announced our, our beta of our LM token billing product. But there's a third side, which is like this sort of idea of agents as economic actors or agents that can spend money as, you know, as part of their attempt to solve a prompt Log in, drop a credit card, buy a plan. There was a machine-to-machine transaction that happened that gave micro-access to the tool for the capacity the agent needed to do the job at hand, and we see it used browser-based in parallel in postal form, and it issued those payments programmatically, accessed just what it needed, did a little offset, a Stripe climate purchase, and then got your party planned. And what I like about this is what's really interesting about this particular example is it makes it very clear the economics of doing something agentically. I like this little, you know, we got a little Stripe climate shout-out here, but it also just calls out, like, this actually does cost you in tokens whether or not your agent is doing outside transactions. So we're already operating in an economic framework, right? STEVE: Yeah. I think I'm on a Stripe plan here, but, you know, in general, like, people have a subscription relationship to these providers, and that costs money. And we get a certain number of tokens. And any prompt I give, even though I'm not, like, seeing the penny count move by, has an ultimate dollar cost to it, right? And, you know, maybe in the typical coding example and, you know, consuming tens of thousands, hundreds of millions of tokens, we've sort of justified the value of that, right? Because the code has business value and this has monetary value. But, like, the sort of, like, token and the currency that backs it are, like, they feel closer than ever. And, you know, whether I'm spending a penny or a dollar on a third-party service, or I'm spending, you know, tens or hundreds of thousands of tokens with the LM, we're sort of doing a similar activity, right? Which is that we need intelligence or we need data or we need operations or we need a service to execute on that prompt and, you know, achieve some outcome. And I think it's, like, even just this view feels very provocative, and it feels early, but I think it's going to feel very natural over time to see the token and the dollars side by side. And, you know, for me, it's like, you know, I planned a birthday party for, I mean, I don't know if it's any good, but I planned a birthday party for $5.47. That doesn't seem too bad. Again, we're doing this episode in the year of our cloud 2026. Like, we're going to show the terminal-to-terminal example. And most people watching this, and again, how AI is for everybody, super technical and not, they're going to look at this and be like, okay, But yeah, like, I'm not going to plan my birthday party in the terminal. But let's just pull that thread six months in the future, 12 months in the future. There's going to be a bunch of builders out there that are going to wrap this in a much more consumer-friendly user experience. And then you're going to be able to build such interesting products that can interact and transact in just a much more human way, which, again, can just solve problems in a different mindset. Yeah, and I think it would be really interesting to build a business where your primary consumer sort of wants an ephemeral interaction with you, and it doesn't necessarily require you having a dashboard or an admin panel or a landing page or, you know, all the other typical things that are really useful, you know, when a human or a business is interacting with you. And instead, you could focus on, like, just a hyper-useful single API and monetize that directly and make your, you know, audience primarily agents. I think a lot of just, like, really interesting businesses can emerge out of that opportunity. I completely agree. And then we're going to have agents identify what those businesses are, build them, transact with other agent customers, agents all the way down. Well, Steve, this was awesome. Just to recap for folks, we saw minions and how to kick off development work from Slack and the benefits of investing in developer experience. Again, VPs of engineering, just, like, carve off a DevX team and give it some love. And product managers, get out of the way. You'll get more product at the end of the day if you just give some time and effort towards developer experience. And then we got to see these machine-to-machine payments, which I think by the time the episode is live, we should be able to maybe talk about or see. So fingers crossed, this will be live by the time our episode goes live. And we showed you how to plan a, I just got to zoom in, a matcha cheesecake birthday party in New York City. STEVE: Yes. Chen Li's matcha party, April 19th, apparently. Oh, I guess I didn't pick the date. So the robot has decided that will be a good birthday. So Saturday, April 19th, 3 to 6 p.m. Sounds perfect. We planned a birthday party for $6, carbon neutral. Steve, this is awesome. Before I send you off, a couple lightning round questions. Sure. One, you know, we showed kind of a contrived personal use case, but what are your personal workflows for AI? The thing I've been really interested in is the sort of, like, disposability of software. And I have a four-month-old now and almost a two-and-a-half-year-old now. And the two-and-a-half-year-old keeps grabbing my phone to, like, try to change music. So I've toyed around with, like, music apps that are extremely controlled to just six songs. I have no idea how to build iOS apps, but the robot does. So I've been toying around with little engagements like that. And then I use, you know, all the AI apps sort of in the normal way, I guess, in addition. Yeah, well, if folks want to create an app like that, we just did an episode with Jesse Genet, who built a, like, minimalist YouTube for kids where it can only, like, her kids can only watch the videos that she pre-approves, and you can only swipe back and forth. You can't do any, like, no other buttons. It's very, very streamlined. So very similar to your music example. Okay, and then my last question, which got a sneak preview of a little up on this Claude example, but when AI is not listening, you know, when your minion does not one-shot, what is your prompting strategy? And you're a parent. So, like, do you gentle parent your AI? Are you like, I know you can do it? Or do you, you know, do you bribe it? Do you offer it 15 cents, carbon neutral? Like, what do you do? STEVE: This sounds crazy, but, like, I have made a concerted effort to always be polite. And I don't, I mean, like, I like sci-fi. I like alien stuff. You know, all that, like, there's this sort of, like, who knows if that's going to happen or not, but, like, I definitely don't want to be caught being rude. Even though, like, I think I've read some stuff of, like, you know, being more intense or being rude can result in better results. Like, I don't want to, like, I'd rather have to do a little bit extra work than have it on the record that I was mean. Because you never know. But the more serious answer is, one, asking it to explain or justify itself has helped quite a bit. And then I think in other cases, I've tried, like, in other cases where I know the right direction to go, I will start going in the right direction, and then I will ask it to look at sort of, like, the get status, to look at the diff, or, like, look at other sort of, like, breadcrumbs that I've left as, like, the directional thing to help guide it. And then, of course, like, if I'm doing the thing that's not recurring, but that I'm going to do again, I try to keep that in some skill or prompt or otherwise that I can inject back in later. Got it. So you're doing, like, the dad teaching his kid to ride a bike move where, like, your hand's on the back of it, and then you let it go. But you're like, here, this is what I want. STEVE: It didn't really hit me until you said that, but there's something really weird about raising kids at the exact same time that the robot emerges. That hadn't really clicked with me yet. So I don't know what's informing what, but they are happening at the same time. Yeah. I said something like, it's really interesting to be raising kids and literally writing, like, soul.md files into my agents. Like, I guess that's a virtuous cycle of skills. Well, Steve, this has been awesome. Where can we find you and how can we be helpful? We can learn more about the work we're doing at Stripe at stripe.dev, which is our blog. So you can learn all about, you know, some of the interesting things we're building. The demo I just showed you, you can learn more about at docs.stripe.com/payments/machine. And I guess I'll plug my Twitter, which is just at Steve Koloski. So those three, yeah. STEPHANIE: Thanks for joining How I AI. This was awesome. Awesome. Thank you so much for having me. Thanks so much for watching. If you enjoyed the show, please like and subscribe here on YouTube or, even better, leave us a comment with your thoughts. You can also find this podcast on Apple Podcasts, Spotify, or your favorite podcast app.