Spec-driven development: The AI engineering workflow at Notion

Overview

This episode is about how AI is changing software work beyond autocomplete. Ryan Nystrom from Notion walks through three patterns he uses day to day: automating standup prep, using agents to handle coding tasks in the background, and writing specs first so agents can build from them. The thread through all of it is simple: move human effort away from repetitive coordination and toward decisions, architecture, and review.

Key Takeaways

Ryan's meeting workflow is a good example of where AI helps without needing a grand plan. He built a Notion AI agent that runs each morning, checks Slack, closed tasks, merged pull requests, prior meeting notes, and a Honeycomb metric, then writes a pre-read for the team's standup. That changes the meeting from a round-robin status report into a discussion about decisions, risks, bugs, and next steps.

A big point from the conversation is that the value is not only time saved. Ryan says the prep probably saves him around 20 minutes a day, but the bigger gain is cutting context switching and mental drag. He no longer has to gather the same updates and reshape them for different audiences. The agent does the assembly work, and he shows up ready to talk about the actual problems.

On the coding side, Ryan describes a spec-driven workflow that sounds old and new at the same time. He starts with an empty markdown file, often by dictating rough thoughts into Whisper, then asks Codex to turn that into a proper spec based on examples in the repo. After a few edits, he points Codex at the spec and tells it to build. In this case, he says it "basically one-shotted" the feature because the spec included implementation pointers and a verification section.

That verification piece matters. Ryan argues that the engineer's job is shifting toward system design and proving correctness. If an agent cannot verify its work, that is the first problem to solve. His team even built CLI tools so the agent can test Notion AI behavior directly, send prompts, inspect transcripts, and check whether the feature actually works.

Another useful point: this is not extra process layered on top of engineering. Ryan says teams were already writing design docs and debating implementation choices. The difference now is that the spec can serve as the source of truth in version control and also as input for an agent that writes the code.

Practical Steps

Pick one repeated coordination task and automate the prep, not the decision-making. A daily standup is a good place to start.
Write the agent instructions like a description of what you would do by hand:
- look back 24 hours
- check Slack
- check merged PRs
- check closed tasks
- pull one or two key metrics
- output a brief meeting pre-read
Limit permissions. Ryan gave his agent read access to most sources and write access only to the meetings database where it posts the update.
Start coding work with a spec file in your repo. Include:
- feature behavior
- code pointers
- edge cases
- verification steps
If you're unsure how to write the spec, talk it out first, transcribe it, then have an agent turn the rough notes into the team's format.
When a feature changes, update the spec first, then ask the agent to reconcile the code to the spec.
Build or set up a verification loop early. If the agent cannot test the result, you will spend more time guessing than reviewing.

Notable Quotes

Ryan Nystrom: "I didn't start with writing code. I just started with an empty markdown document."
Ryan Nystrom: "Your AI, your agent, is never going to complain when you ask it to do this five minutes before the meeting starts."
Ryan Nystrom: "No more waiting for the meeting. No more waiting for review."

We were writing these documents anyway and debating implementations in meetings, and now there’s no more waiting for the meeting, no more waiting for review. — From the episode

Full Transcript

Source: openai 47m runtime

One line that I've been putting in my prompts lately is, I literally don't know what I'm doing here. You got to explain it like I'm a five-year-old. I didn't start with writing code. I didn't start with anything. I just started with an empty markdown document. I actually just opened up Whisper and just started yapping about how this feature should work. I gave the Yap session to Codex, and it was like, write a spec. I then opened up Codex again, pointed it at this spec file, and I said, build it. And basically one-shotted this. I've been in software engineering for 20 plus years. We were writing these documents, and we were sitting in meetings with other engineers debating the merits of one implementation versus another. And now, no more waiting for the meeting. No more waiting for review. I'm not a CI expert. But I kind of know what I want. And so other folks were kind of like, can you just bring some of your puppy dog energy to CI and just see what we can do? Your AI, your agent, is never going to complain when you ask it to do this five minutes before the meeting starts. It is more relaxing, and it's more fun. And I feel like I'm getting more done. It's weird to have this win, win, win. They do the triangle, and they're like, pick two. And you're like, no, I'm going to pick all three. Give me the whole triangle. Give me the whole triangle. Welcome back to How I AI. I'm Claire Vo, product leader and AI obsessive, here on a mission to help you build better with these new tools. Today, we have Ryan Nystrom from Notion. And he's going to show us, as an engineering manager, how you can never prep for a stand-up again. We're also going to see how you can get a background agent to write code for a fix your friend texted you. And how spec-driven development really works in a codebase at scale. Let's get to it. This episode is brought to you by WorkOS. AI has already changed how we work. Tools are helping teams write better code, analyze customer data, and even handle support tickets automatically. But there's a catch. These tools only work well when they have deep access to company systems. Your co-pilot needs to see your entire code base. Your chatbot needs to search across internal docs. And for enterprise buyers, that raises serious security concerns. That's why these apps face intense IT scrutiny from day one. To pass, they need secure authentication, access controls, audit logs, the whole suite of enterprise features. Building all that from scratch? It's a massive lift. That's where WorkOS comes in. WorkOS gives you drop-in APIs for enterprise features, so your app can become enterprise-ready and scale up market faster. Think of it like Stripe for enterprise features. OpenAI, Perplexity, and Cursor are already using WorkOS to move faster and meet enterprise demands. Join them and hundreds of other industry leaders at WorkOS.com. Start building today. Ryan, welcome to How I AI. I am really excited because you're going to show us, I think, start, medium, and advanced mode on some AI coding stuff. And so, before we jump in, how has AI just changed how you live your life at work? I mean, it has completely upended the way I work. I've been doing this, I did a lot of mobile iOS work in my past. And I worked the same way every day for like 12 plus years. And then the last year, I have changed IDEs, terminals, tools, like whatever, like 10 plus times. So, it's like really weird and scary to be like changing this stuff so much. But it's also, I feel that I've been doing this for a while, and I'm feeling so much joy and freshness and newness in this that I wake up every day super excited to tinker and build things. And I'm also like working faster and harder than I feel like I ever have, but in a good way. I know people are like kind of freaked out about like, everything's changing, the pace is up. But like, it's really energizing for me. Well, you are not alone. I think every How AI guest has come on and said, I'm having more fun. I'm working faster and everything is different. And what I love about what you're going to show us is, it's not just the set of tools, which I think has changed or how we write code has changed. How you like run a team has changed. So, and I think for the better, you know, so I would love to see how you're using Notion AI to actually run teams differently. Yeah, so for context, I manage a team of like six, seven people. I'm like an engineering manager, technical engineering tech lead manager, whatever we call it. I manage people and I write code is like my role, which I love. And I've run a bunch of different projects here at Notion. The one I'm going to show today, we've nicknamed Afterburner. Like kind of a quick backstory is, I've been kind of vocal about our like DevX CI for a while. I've just, I worked at places where it's really slow. I've worked at places where it's really fast. And I came here and I was like, we're kind of like in between, but I feel like we're like slower than we need to be. And eventually this like caught up to me and somebody was like, could you just come fix it and like come work on it? And I'm not like an infra expert. I'm not a CI expert, but I kind of know what I want. And I think also most importantly, the group that I manage and the org that I work in, we're a little notorious for being like really fast and like very, very AI-pilled. And so other folks were kind of like, can you just bring some of your like puppy dog energy to like CI and just like see what we can do? And that's what we've, that's what we've done. So we had this really aggressive goal to cut our CI into like a quarter of what it is. We're on the path to doing that. But so what I want to show you is a little bit about how we run projects in Notion and how I'm using AI to kind of streamline those projects. So what we're looking at here is like our like project hub called Afterburner. And so in here, I've got all this documentation. I've got databases. We'll look at some meetings. I have like an automation that looks for any sort of like little wins if we knock off seconds from different jobs or whatever. And we keep it, we keep track of them in here. But what I wanna show you is basically how we run our meetings. Our small group, we'd run a standup every single day. And doing standups where everyone just like is kind of like dead-eyed and going around being like, I did this. I shipped this change or you know, no updates for me, thanks. Is like painful. And in my opinion, like a huge waste of time. I wanna like get to the meat. So we have this kind of like automated meeting template that shows up every single day that we run our meetings, which is basically every day. And it starts blank. And then what I have set up behind the scenes is a custom agent. So we chipped this Notion AI custom agent stuff a little bit ago and this runs right after the meeting template gets generated and it looks through all of our like Slack conversation in the last 24 hours, any tasks in Notion that we closed, any pull requests that we've like merged, just like all sorts of context. Oh, and it looks at like yesterday's meeting transcript as well. And then it compiles it basically like a pre-read. And this is one from a week or so ago. And it, yeah, it shows, it pulls metrics. It can show us like what our latest CI time is. It shows some of the things we've decided, shows like progress on different like projects or like different things that we're trying to like make faster, bugs, feedback, open questions, like anything that's like of concern. And like, I can basically work up until like the minute of our meeting without having done a bunch of like prep. And then we all get on a video call and we look at the screen and we're like, okay, here's what we need to talk about. And we'll like hit each bullet. We have like meeting notes that we run down here, within Notion and like all the context is basically like captured. All of the like agenda is like set up for me. So we spend the entire time talking about like problems, decisions, wins, findings, like what are we going to work on next? And it's less the like, oh, I did this thing. Yeah, what I want to call out for folks that are maybe listening and not watching is this is a very detailed meeting, like kind of like pre-read slash status update, which if you're a TLM or an EM running good meetings, unfortunately, is part of the job. And there's a really big difference between a good standup and a bad standup. And I think the, the ones you described, these like wrote like, I wrote this PR today. I'm going to start on this. And then basically like a notes document. That's a very high level because some human is putting it together. It stinks. And you start to like do the other thing that I found. And maybe I'm curious if this has impacted how you work or just made it easier, which is like, you start to have those meetings less because the updates aren't super rich. People don't feel like it's a good use of their time. And I think you lose something by reducing the frequency. But if you can have high bandwidth, high quality meetings with high frequency without the overhead, Because we haven't seen, actually haven't seen, we've had a couple of Notion folks on the podcast, but we haven't seen Notion AI in action. And I just want to see kind of your thought process on how you build something like this out. So I'm going to flip over to, this is our custom agent. Don't ask me why we got this like potato theme for like the entire project. I think it was kind of something about like, CI is just this like cobbled together like mess. And so we're gonna like make the potato like a rocket ship. I don't know. I don't even know if that makes any sense, but like, now we're having fun with it and we have reactions and like agents and it has spun off into its own thing, which is fun. But this is our hot potato agent. So you can see, I have this set up to run at 9 a.m. every single day. It's also set up for chat. It's set up if the agent's mentioned, but we never really use any of that. And probably the important part is the actual instructions. So in this instruction page, giving context on like what the purpose of this agent is, I am telling it to run, yeah, look back at 24 hours. So basically telling it your job is to run every single day and I only want you to look back for the last like 24 hours of activity. I'm explicitly telling it to use sub-agents, which is kind of a sleeper feature in Notion AI. Like this exists, but we don't really push it to use it very often yet because it's one, it's very expensive and two, it can be kind of finicky sometimes, but I helped build it, so I know how this works. And then I ask it to kind of like fan out and do like a map reduce where I'm saying, go use the Honeycomb MCP to figure out what the latest metric is. Look in our project channel and like find updates, feedback, questions. I tell it where the task database is and how to look for tasks within this project, and then how to find yesterday's meeting. And then I give it a template in the instructions where I'm like, this is your format. I care about CI speed, decisions, progress, changes, bugs, questions, risks, a little bit of guidance on writing. And then when it's done, I have it post to Slack and I like emphasize this like, I want it to be brief and fun. And sometimes it's really corny and then sometimes it's like really good. And it'll be very quirky and just post this like link in our Slack channel and it's like, hey, here's your pre-read, some little quibble about like whatever, you know, like, hey, you guys are not making enough progress. And then that's it. And then it's, we have our meeting note. It's updated. What I like the most about this one, let me show you some of our like internal settings. So this is like, I give it access to all of these things. I'm like, you can only view all of this stuff cause I don't want it going and like modifying our task database, our project database. Everybody at Notion uses those, but this meetings database in particular, I'm like, you know, you can edit content because this is the one you're gonna like write and update the page. It can read from our Slack channels, respond to our project one. And then this was new to me, actually, when I set this agent up is our MCP. So we've had MCP and we have this other thing called workers, which is like kind of like writing code. I haven't used them very much within custom agents, but in this one in particular, I'm like, I know exactly where this metric is. It's in Honeycomb. And so I like just configured the MCP in Notion. By the way, I like used the agent to like set itself up because I was like, here's the query. I literally give it a screenshot of the Honeycomb query. I was like, I don't know how this works. Can you just like update your instructions? I love that you screenshot it. You didn't even copy and paste it. You're like, please OCR this screenshot. Exactly, too lazy. I'm like, here it is. Just take it. Figure it out. And it kind of, it got it mostly like most of the way there. I had to fiddle with it a little bit, but yeah. What I appreciate about this, and again, for anybody trying to just brainstorm workflows where AI can actually have a huge impact on your productivity at work or life. I just feel like write down what you would do if you had time. If you had time every morning at nine, you would sit down and you with your eyeballs would go through Slack. You would go through Honeycomb. You would ask people what's going on. You would look at GitHub and then you would compile it and then you would try to be very fun in Slack. Like it's, it's just a description of what you would do. And it's, it doesn't have to be that complicated and you can iterate so quickly on it. What I appreciate about this, you know, versus old era of more deterministic like workflow style builders is the updates are so easy to make. Just change the natural language, redo the order, change the trigger, give it more access to data, and then it's, it's ready, ready to go. Yeah. And you know, I think the other thing I I've gotten hung up on when trying to think about like these automations and I I've seen others do is that like you get this like, you start gigabraining it and you're like, well, how am I going to save like five hours of work a day? And I, what you just said made me realize too that like the tedium that this removes for me is not like world changing, but it's like 20 minutes a day. And that's like 20 minutes. I can spend doing other stuff and that like, it's not even just about like saving that 20 minutes, but it's like protecting my brain from like having to context shift about all this stuff and like ingest it and instead, yeah, it's just, I know that the information will be there when I'm ready to like read it and I'm ready to like shift gears to this project rather than, yeah. I, I hate doing the like, read this update, copy all this information, like put it into like an update board. Like it's soul sucking. I mean, you and I have been doing this for awhile. That's like, that was, I feel like 70% of my job, at some point, 70% of my job was just like what's going on and how do I massage it into a format appropriate for the audience at hand. And it's always the same information. It's just like, what's the executive version of it and what's the team version of it and what's the full team version of it? And like, ah, like my shoulders drop out of my ears when I realize we don't have, like we just don't have to do it anymore. And the other thing that I think people maybe underappreciate about AI and this like just in time delivery is your AI, your agent is never going to complain when you ask it to do this five minutes before the meeting starts. I know it's so, it's so great. It's so great. It's just like when you have it, drop it, get it done and out of your brain. I think is just, again, I go back to like burnout and enjoying your work and reducing toil. And it just feels like a more relaxed way to work. Yeah, it's so funny because it is more relaxing and it's more fun and I feel like I'm getting more done. It's weird to have like this like win, win, win. Yeah, you know, they, they do the triangle and they're like, pick two. And you're like, no, I want to pick all three. Yeah. And I'm like, give me the whole triangle. Give me the whole triangle. That was stamp it for the YouTube thumbnail. Give me the whole triangle. All right. This episode is brought to you by Orcus, the company behind open source conductor, which powers complex workflows and process orchestration for modern enterprise apps and agentic workflows. Legacy business process automation tools are breaking down. Silos, low-code platforms, outdated process management systems, and disconnected API management tools weren't built for today's AI powered world. Orcus changes that with Orcus conductor. You get a modern orchestration layer that scales with high reliability and brings humans, AI, and systems together in real time. It's not just about tasks. It's about orchestrating everything. APIs, microservices, data pipelines, human in the loop actions, and even autonomous agents. So build, test, and debug complex workflows with ease, all while maintaining enterprise-grade security, compliance, and observability. Orcus, orchestrate the future of work. Learn more and start building at orcus.io. Let's talk about, so we're talking about how meetings happen. Love this. You do write code, though. I do write code. Sometimes with your fingers and sometimes with your and my favorite harness of the moment. So let's go to how you get code in. Yeah, I want to show the little bit of a new workflow that we have going on in Notion. I honestly don't think this is necessarily a big feature that we're going to ship or some, we might ship some version of this feature. So this is all basically internal only at this point. But prior to this, the way one, I love codex. I've been a codex stand for like, I don't know, six, seven months now. And we started, we started building this like Codex integration into Notion. Prior to this, it was like, I mean, obviously you're using the CLI to like write your prompt and then they created the Cod the state of the art with coding agents. And someone had this really great idea to like let's not start with code. Like let's just start with specs. And what we've ended up building is we have this like in our checked into our code base. You see this? We have this, we're looking at, this is a Notion repo, but we have this agent specs subfolder. And within this subfolder, we have all of these markdown documents. This is one that I worked on. We have this thing in our AI called ask mode where we basically ban all the like mutating tools. So it can only just like read and answer questions. And so when I was building this, I didn't start with writing code. I didn't start with anything. I just started with an empty markdown document. And I mean, I actually just opened up like whisper and just started yapping about how this feature should work. And at the end of it, I gave that, I gave the yap session to Codex and was like, here's our other like spec library, learn the format, take my information, write a spec. And then it spiked the first version. I did a couple revisions on it and it ended up with this markdown document. Now the markdown document is like, it's nice, but what we did with it next, in my opinion, is I kind of think that this is like the future of software engineering where I then opened up Codex again, pointed it at this spec file, and I said, build it. And it basically one-shotted this because the entire spec file is so comprehensive with code pointers, with down at the bottom, we have verification. It was like, here is how you're going to verify all of this stuff works. And we've even built our own CLI tools so that you can run Notion AI from the CLI and it could, once it's done seeing that all the tests pass, it could actually just spin up Notion AI itself, send it queries, send it questions, enable ask mode, disable ask mode, and then see the transcripts and like see what actually happens. And I think the first shot of this took a couple hours, but I came back to whatever, a couple thousand lines, did some code review, played with it myself, and I was like, it's right. It's like done. And you know, since then, we've made, I can, the other beauty of this is like, this is in version control. So I can go to the past changes of this spec file and I can see how the spec has evolved. And I could go look through all of the code changes, which also have their own like history. But this is now the sort of like source of truth for how this part of Notion AI works. And it's just in plain English that can then be verified and implemented by agents. Well, I think the other thing that people don't appreciate is, you know, taking this outside of the engineering flow is this plain English can be ingested by other parts of the business that need this information. So let's say you need to then release this feature via some sort of marketing. This is actually like a pretty good asset that explains how it works. That can be translated into another, another thing in a way that like code itself is still a little intractable. And so this idea of spec-driven development, but what I like about what you said, I don't want people to miss, is the way you make these updates is you update the spec and go, go, go look, make the, make the update, change the thing. Exactly. And so like the spec is the source of truth. The spec as the, the change log, I think is a really interesting model. And for people that aren't watching, it's very detailed. It's very technical. So it's not like there is not code in the spec. That's right. There's just not all the code in the spec. And I think that's a really kind of good hybrid model for experienced engineers to start to bridge into what would it look like to have an agent do more of your coding work while you still do architecture work, while you still do design work, while you still make sure that the thing is going to scale. Yeah, exactly. I, I view our job as like engineers evolving into like systems thinkers and architects. And not, and not even just necessarily writing like the spec and thinking about the behaviors, but most importantly is like the verification loop. Like, is it a, like, how should it verify correctness of this feature, um, or this change? And honestly, it's like, if it can't, or if like the verification's a little hazy, it's like, that's the first thing you actually should be going and doing is like, do you have a tool to let the agent, uh, run itself? That's like one of the first things we did with this project was like, we, we should actually build a CLI so that I can tell Codex like send this prompt and like see what happens. And now that we have that, then we can take these specs and actually just like go deeper and deeper and deeper. And so it's like, we're still doing engineering, but I'm not doing the like plumbing work of like wiring up this ask mode feature. Well, and one other thing is really funny is, I've been in software engineering for 20 plus years. Like we were writing these documents anyways. We were writing technical design documents, spec documents anyway. And we were sitting in meetings with other engineers debating the merits of one implementation versus another. And then we still had to go write the code. Yep. It is like really, it hasn't added work to, to go into this model. It's maybe like shifted the emphasis of where the human attention goes, but these were documents that at least I was in every org that I've ever been in writing before. And the other thing that I think has changed so much is those docs then waited for review and they waited for a meeting. And now no more waiting for the meeting, no more waiting for review. Ship it. Have a verification loop. Debate it on the merits of it being live and working versus the theoretical merits of it sitting in a document, waiting for everybody's calendar to open up for, you know, a live argument. Yep. Yep. Couldn't agree more. Let's, let's do it. This has been my, I, I just, you and I could talk all day about this. Just to recap for everybody, because I know I gotta get you out of here. Three use cases. One, never prep for a meeting again. Hook it up, not only to your Slack, but to your meeting notes, your GitHub, your telemetry, and build the best standup meeting. So no one has to stand there glassy-eyed being boring, giving updates. Second use case, background agents, at mention from wherever you work, Notion being a great place to do that, kicking off virtual machines, getting PRs done, just saying yes when your friend texts you, can you ship this feature? And then the last one, putting all your specs in your repo, using them as a source of truth for a more autonomous coding agent like Codex. Let it cook for a couple hours, review the code. And then when you update, update the specs. Don't update the code. Did I get it right? Nailed it. Okay, let's do a couple lightning round questions, and I'll get you out of here. You and I love Codex. Why? Why do you love Codex? I'll tell you why I love Codex, but you go first. Okay. Well, so I first, I first fell in love. Oh my God, did I just say that? You did. I first fell in love with Codex because when I was like evaluating both like cloud code and Codex, I found anytime cloud code like filled up its context window, it would just kind of like lose the plot really quickly. And Codex, I don't, I don't know exactly what it's doing, if it's the model, if it's the compaction, if it's like both, but it can, it can grind for like hours. And with, with the way that I work, both the systems and things that I work on and just like, I like to be able to fire off a bunch of them, like at the same time and then like go to a meeting or go do something else or like uh spend my time kind of like round robin managing like all of these agents. Like I, I, I don't necessarily wanna, I'm not the person that is like sitting with the browser open and like um some agent next to it and like iterate, look at the browser, iterate, look at the browser. I'm like the more, the closer I can get to one-shotting solutions, the better, because that frees me up to do other stuff. Um, so I found that like Codex was, was pretty good about that. I also just feel it's like pretty simple. Like it's, there's not a lot of bells and whistles. It's not like too fancy necessarily. Um I'm happy with the addition of like MCP and skills um and some like other stuff coming out. I also really love GPT-5 4. Um I think it's a great model. So I'm like, all of those things together, it's just it really like matches my working style and type of work a lot. Yeah, I'll, I'll tell you why I love Codex. And I sent this to somebody. I said, Word trees everywhere. Ports 3000 through 3009 spoken for. Like we're just, we're just going across, across the board. I do think it's good at long running tasks. I like its concept of projects because I run a lot of different projects. Me too. And that's just like a very helpful mental model. And then it's great at code review, honestly. Oh yeah, it's pretty good. It's just like a really good