How to build your own AI developer tools with Claude Code

Overview

This episode of How I AI features CJ Hess (10x) walking through an AI-native software engineering workflow built around Claude Code—specifically, how he turns planning artifacts into executable work. The core theme is “planning as a first-class interface” between a human and an agent: CJ replaces brittle ASCII diagrams and text-only plans with structured JSON that renders as interactive flowcharts and low-fidelity UI mockups.

CJ also demonstrates a pragmatic quality strategy: let Claude build quickly, then use a second model (Codex) as a critical reviewer to catch mismatches, code smells, and refactoring opportunities before changes solidify.

Key Takeaways

CJ argues that Claude Code’s real differentiator isn’t only raw intelligence, but steerability and “intent understanding”—the feeling that it reliably goes deep when asked and stays aligned with what the developer means. That quality enables an “ecosystem” approach: instead of treating the agent as a single tool, he layers custom skills, docs, and small utilities around it to compound productivity.

A novel planning move is CJ’s shift from markdown and ASCII flowcharts to a custom tool (“Flowy”) that takes JSON specs and renders them into readable diagrams and UI mockups. The counterintuitive insight: the plan doesn’t have to be optimized for humans or models exclusively—CJ uses visuals for his own cognition while keeping a machine-readable substrate (JSON) that Claude can reliably consume and update.

The episode also highlights “living documentation for agents.” CJ iteratively improves Claude skills based on failures (spacing, colors, layout collisions), treating skills as continuously refined operational docs rather than static instructions.

Finally, CJ showcases model-to-model comparison as a lightweight replacement for traditional review depth: Claude writes, Codex critiques. Codex is positioned as a “curmudgeonly staff engineer” that’s especially useful for review—even if it’s not always the preferred generator.

Practical Steps

Create a dedicated planning loop: start with a simple markdown plan, but promote anything visual (flows, navigation, UI states, animation timing) into a structured artifact that can render as a diagram.
Use “plan as code” formats (e.g., JSON schemas) so your agent can both generate and read back the latest truth from files—not from chat history.
Build or adopt a small renderer/editor so you can tweak diagrams directly, save changes, then tell the agent: “Read the updated file and propagate this change everywhere.”
Maintain agent-facing “skills” as living docs:
- Include purpose, quickstart, schema/template, constraints, and examples.
- After each failure, update the skill with a new rule (e.g., spacing guidance, color contrast defaults).
For implementation, try a constrained “just build it” prompt only after you have diagrams/mockups that fully specify behavior and states.
Add a second-model review step:
- Prompt it to verify: (1) alignment with plan artifacts, (2) code smells, (3) refactor suggestions.
- Feed findings back into the primary coding agent (or let the reviewer model patch the issues if it’s fast enough).

Notable Quotes

CJ Hess: “Working with Claude is just such a delight. It just feels so steerable… it really has intent understanding.”
CJ Hess: “This is almost like living documentation. And there’s docs for people and there’s docs for agents, and those just end up being skills.”
CJ Hess (on Flowy): “This is a dev tool that was almost 100% prompted.”

Full Transcript

Source: openai 53m runtime

Working with Claude is just such a delight. It just feels so steerable. And I think the one thing it really has is intent understanding. When I want it to dig deep, it just does it. And it's really enabled me to build a little ecosystem of my own tools around it. I think environment setup and developer setup is such an underappreciated use case. One of the things that I know you really care about is effective planning. And you've come up with a way that you do your planning that I think is pretty unique. So I've played around with this tool to basically give Claude these JSON files. And there's a whole set of skills I've built around this that Claude Code can use to write these out. And then these actually end up generating nice-looking UI mockups. I will say this is a dev tool that was almost 100% prompted. Welcome back to How I AI. I'm Claire Vo, product leader and AI obsessive here on a mission to help you build better with these new tools. Today I have CJ Hess at 10x. And if you've seen him on X, he is building some of the most useful tools and flows for being a quote-unquote real AI engineer. We're going to get a sneak peek in his tool Flowy that he vibe coded for himself. And he's going to show us how he uses model-to-model comparison to make sure his code is great. Let's get to it. This episode is brought to you by Orcus. The company behind open-source Conductor. The platform powering complex workflows and process orchestration for modern enterprise apps and agentic workflows. Legacy business process automation tools are breaking down. Siloed low-code platforms, outdated process management systems, and disconnected API management tools weren't built for today's event-driven, AI-powered, cloud-native world. Orcus changes that. With Orcus Conductor, you get a modern orchestration layer that scales with high reliability, supports both visual and code-first development, and brings human, AI, and systems together in real-time. It's not just about tasks. It's about orchestrating everything. APIs, microservices, data pipelines, human-in-the-loop actions, and even autonomous agents. So build, test, and debug complex workflows with ease. Add human approvals, automate back-end processes, and orchestrate agentic workflows at enterprise scale, all while maintaining enterprise-grade security, compliance, and observability. Whether you're modernizing legacy systems or scaling next-gen, AI-driven apps, Orcus helps you go from idea to production fast. Orcus. Orchestrate the future of work. Learn more and start building at orcus.io. That's O-R-K-E-S dot I-O. CJ, welcome to How I AI. Thanks, Claire. It's good to be here. So I've seen a lot of cloud and AI engineering power users, and I still think you're like a super power user of some of these tools. And it's not just because you're creating real production code with what you're building, which is really nice to see, and I think a subset of what we're seeing out of folks using these tools. You also build tools for yourself to make the process of AI engineering better, and you share those tools with other people who then validate that they're actually helpful. So why are you so excited about, in particular, cloud code? And what has it changed for you as, as we were talking before the show, a quote-unquote real software engineer? Like a lot of people, the Opus 4.5 moment was a big one. But I've been on cloud code, I don't know, maybe last May. But for me, it was really about the harness they have. And like, I see a lot of arguments about Codex and cloud code, and I'd honestly argue GPT 5.2 is a smarter model. But like working with cloud is just such a delight. Like in cloud code, it just feels so like steerable. And I think the one thing it really has is like intent understanding. Maybe I'm not giving, you know, Opus and Cursor like the shot it deserves here. But there's something about in cloud code, like, when I want it to dig deep, it just does it. It feels to like pick up on my intuition just in the prompts. And it's really enabled me to almost like build a little ecosystem of my own tools around it, around cloud code, kind of particularly with skills now. That just like, keep making it better and better for me, because it's cloud code plus like this system of skills and tools that I've built around it. So it's, it's like really hard for me to get out of it. Yeah, what I love about this moment as a software engineer is, you know, back in the in the olden days, you sort of had like your choice of like, what's going to be my IDE? And am I going to use Vim? And like, what, you know, what are my, my set of approved tools as an engineer that I can use to make, you know, what linters are we using as a team, all this kind of like, there's stuff that you could do to customize your developer environment. But now you can really take it to the next level. And you could have a totally different AI engineering workflow than your colleagues sitting next to you. And it's totally fine, because it's making you individually a lot more efficient and effective. And you're building them yourselves for pretty cheap. So there's not that cost or that hurdle of evaluating new things in your stack. Yeah, there's even like, it's almost like, one, you now have the brains of Claude to almost like do some dirty work on the dev tooling. Like, I think, you know, pre any of kind of the newer gen models that just really can handle the agentic loop. And, you know, sitting with like a broken linter, and just accepting it and having like ignore comments everywhere. So that, you know, I just, I just would give up. And now it's like, I feel like I can almost trust it to be like, what's wrong with this config? My IDE isn't matching what's in the project. Okay, we have to resolve this and just kind of solving those like chore problems that I feel like previously just ended up being forever problems. Yeah, and for the non engineers watching or listening right now, I think environment setup and developer setup is such an underappreciated use case. Yesterday, I onboarded a designer who had literally has kind of like sat out some of this AI stuff. It's literally not downloaded anything used anything. And she's on cursor, Claude code nodes running homebrews installed. And I was like, just ask Claude code to do it. Say like, help me understand this repo and get my computer set up to run. And it just and I said, and then just tell it it can accept all tools, let it go and come back for laptop later. And it's pretty great. I mean, we're really, really spoiled right now. So let's dive into some of your actual workflows. And one of the things that I know you really care about is effective planning. And you've come up with a way that you do your planning that I think is pretty unique. Yeah, so there's kind of the classic plan. So I'm going to swap over to cursor here. I have in this just like your classic dot plans folder, just throwing plans in here. And I really love this format. I think a lot of people are kind of converging on this of like iterating on markdown, having one file where you're just like working through the plan, reviewing the plan. And by the end of that, you can almost feel confident just letting it write the code. But the one like piece that I hated that I found really valuable was these ASCII flowcharts. So if you're just listening, it's all those like boxes and arrows that Claude draws. And, you know, there's always the ones where this one actually looks pretty clean. Yeah, there's always this like misalignment of that edge character. I don't know why we haven't figured that out yet. But for things like UI mockups, things like, you know, flowcharts of how navigation is going to work, how a certain system is going to work. I really like this visual way to think about things. But I really hate staring at these ASCII like diagrams. Even things kind of like Mermaid and everything just didn't feel exactly what I was going for. So I've played around with this tool to basically give Claude these JSON files. And there's a whole set of skills I've built around this that Claude Code can use to write these out. And then these actually end up generating nice looking UI mockups, not in super high fidelity or detail, but, you know, I can kind of guide it the direction I need. And up here, this white text might be a little hard to see. But basically, this is a flowchart on this tool, Flowy, and how it works. So for the listeners, what I love about this is Flowy is a tool that you built. This isn't, you're saying like, oh, I was playing with this tool. It's like, no, you built this tool for yourself. This was my first experiment with a Ralph loop. I'm still not certain how confident I am in them, because I had to do a little bit of cleanup. But overall, I will say this is kind of a dev tool that was almost 100% prompted. Yeah. And so what you said is, you know, I love plans. I love the idea. And I just have to take a minute. Again, I'm the oldest lady on the Internet. So way back in the day, two decades ago, when I was first doing product management and web design, we did so many flowcharts, so many user journey charts, and then so many wireframes and so many like low fidelity mocks than high fidelity mocks. And what I love about what you're building is you're building the AI native version of that. That piece has not gone away for anybody. It hasn't gone away that you said, like, when you click this, it goes to this. And these are the steps and these are the branches and all that. And it hasn't gone away that you have to look at designs and say, yeah, this is kind of what I want. But now you can have AI create them. And at first you had AI create them in markdown. Very, very low fidelity. And I have to take a side journey that, you know, a year ago, I was like extremely delighted that it was making ASCII markups. And now it's just not good enough. Yeah, that's the shifting expectations on these models. Yeah, exactly. And so you've taken these markdown markups that were useful. And you said now make them really useful by building this sub application that can run them for you. And it's a combination of it seems like workflow diagrams and step by step mockups. Yeah, so there's kind of basically what I wanted was JSON file. It can render and it can have nodes and edges like any flowchart and then roughly be able to stack them, change the colors and get us, you know, something that looks like this where we have a couple different screens. And we have these somewhere between a wireframe and a true mockup that just can help me point the model in the right direction. The other big thing for me was iterating on this. I'm not going to go in that markdown file and try to like write new shapes and combine them. So for this, this is also an editor. And as you edit it, all these changes save to that JSON file. So you can then point Claude back at it and say, hey, I know you did this. But actually, let's say I want to step here and I'm going to bring this up and add some edges. And then you can be designing in here almost like you're in Figma or Excalibur or something. And then Claude can just read the file. And that's like a more native way for it to understand what everything looks like. And you mentioned Mermaid Diagrams. And so I have this question, which is one of the benefits of Mermaid Diagrams is that's a syntax that these LLMs know well and can parse and actually reason about. Do you feel like, have you created a skill where Claude Code can understand and read this JSON? Like, how did you train it to read your kind of proprietary dev tool and documentation? Yeah, so right now there's two main skills I use. There's a third one that's just an overview, basically, kind of the high-level view of what the commands are, what a flowy file would look like. And then I have one that's very specific about flowcharts and one that's about UI mockups. And to make these, I basically sat in the repo of the tool itself, had a bunch of, like, explore subagents going, and then started to make the first UI mockups and the flowcharts and started to guide it on, okay, you put these too close. We need a rule about, like, spacing and how to think about spacing. And just incrementally, I've been building that up where if I'm working with this and something goes wrong, almost an example here would be this, like, white text on these, you know, pastel notes, kind of hard to read. I would essentially hop into the place where I have these skills and say, here's what happened. Give me a suggestion on how to improve this skill so this doesn't happen again. And then iteratively just keep building that skill. And the first flowcharts this thing made were, you know, shapes stacked on top of each other. It didn't make any sense. But it's come a long way. Not much without many changes to, like, the underlying app. It's really just been about, like, getting Claude to understand and know the skill. And I find that works better than something like Mermaid just because I really feel the power of building my own dev tools now and that I really don't want to hit the constraints of Mermaid, if that makes sense. I want to be able to say, okay, I want a new feature in Flowy. I'm going to build it. I'm going to update skills. And I can be confident that Claude can actually work with that and understand the new feature. Yeah, one of the things that I've really observed in myself as an engineer is as I access more and more of my dev tools through, like, an MCP or config as code or any of these things, I start to realize it's very easy for me to extend what they've built and customize it to myself. And so I do think, you know, of all the places, dev tools is an interesting one where, one, your users are super cheap and, two, they're capable of forking what you've built. And, three, there's so much open source that I really do think there's going to be this trend towards build. I used to be when I, you know, I ran these big product and engineering orgs, they used to ask me build versus buy. And I was like, oh, my God, please just buy it. Like, please just take my credit card and buy it and let's not waste our time. And now I've flipped to, of course, we should build this. Until we hit some constraint, we should build it. And certainly individual engineers, if something's useful, you should just build something yourself, at least for V1. Yeah, it's almost not worth spending the extra money anymore. I mean, I feel like I'm seeing this pattern on Twitter, but it's everyone's posting some product, some ridiculous pricing tier and saying, someone please vibe code this. You know, I feel like that's happening all across SaaS. Yeah, so can you show us how you'd either create one of these flowys, use one in your cloud code? Like, how does this actually work? What I was thinking is I have this tips and tricks section in this little, like, demo cloud code guide app. My whole background's in mobile development, so this was the easiest thing for me to spin up. But basically, I kind of don't like these cards. I almost want this to be a little more fun. Let's say you want, like, a spinner wheel. It lands on something, and then it shows you the tip. The development flow for me usually looks like, hopping in here, I have some funny aliases, but I'm a fully bypass permissions guy. So Kevin in my terminal actually routes to Claude with bypass permissions. Okay, so you've named different permission scopes as aliases in your terminal. For our listeners, we have an episode very recent of John Lindquist, who actually shows how to set up those aliases for cloud code. So definitely check out that episode for if you want to set this up. I just have a classic, like, CC, and then I'm going to make a CC scary. That will make me, that'll be my, like, dangerous version. Yeah, I'm more and more in this Kevin mode today. I find that a lot of, like, projects where I'm, you know, solely working on it or working within the team I'm on, we have all the, like, rules set up in Git that if I do something horrible, it's okay. But there are definitely times, like, if I'm creating a PR, every now and then I still do it by hand, but I have a lot of skills that do a lot of those workflows, run the preflight checks, and make sure we're all good before pushing it up. But besides that, I'm kind of okay running dangerously bypass most of the time these days. Great. So you go into Kevin, aka Cloud Code, and what do you do? So for this, my prompt would probably be something along the lines of look at our previous plans and then explore the code base. Just want to re-anchor it a little bit, especially on a fresh chat. On the tips and tricks section, I want to create a spinning wheel where a user presses a button, the wheel spins, and then that is one of the tips. After that, the tip should pop up in a card just below the spinner. Then kind of the next step, and what I've been doing more and more, which is not how I initially started using this tool, is actually having it make the flowchart of how the code's going to work, a system diagram, anything like that. In this example, I'd actually want both kind of the user flow and an animation timing sequence. I've found this to be super helpful with complex animations. So I would say then use the flowy flowchart skill to create an animation timing sequence diagram and a user flow diagram for the tips and tricks page. So we'll send off Cloud. It's going to do a little bit of exploration. Oftentimes, yep, there it is. I actually really like these Explorer sub-agents, and oftentimes I'll kick off three, four, five in parallel just to look at different places, especially if I'm in a larger code base. But just gathering all the context around it, this is a small app, so I don't imagine this will take too long. Then Cloud's going to load up this flowy skill, write it out, and we should be able to look at that in the flowy editor and then play around before we actually implement it. While we're waiting for this to load, can we look at that flowy skill just a little bit just to see how you've structured it? For sure. So let's first, I'll just show you the supporting files. This one's just a skill MD. This shows you how almost hands-off I am with some of these skill files, particularly the ones that I build myself. Yeah, we have a skill 101 episode, and it's a markdown file in a folder. It's a markdown file, and sometimes, this might be a specific example, but with flowy, it's very squishy, I would say. I go in there, I change something quick, I say update the skill, and really the process of refinement is me using it and seeing what failed. So here, I don't super care how this file is set up as long as when I make an update, afterward, it's performing better. I almost feel good letting the model manage what this looks like. So let's read through it. It has a bunch of examples in here. Let me scroll up to the top. I'm sure there's some overview. Great. So, again, classic overview. Hey, we're going to make flow charts and architecture diagrams. They're going to render on this port. Here's where you're going to make them. It knows that the flowy app looks for this .flowy folder, kind of gives it some high level on what does the metadata look like, what do you include, nodes and edges, and then starts digging into the specifics. So we have the different shapes, what a rough kind of schema looks like. You've got your styles, you have icons that you can use, and then starting to list out the properties. So I wouldn't say this is anything super crazy or even too long and detailed, but this encapsulates all the pieces that Claude needs to know. And you can almost see here, as feature development happens, how this skill grows. So recently I'd set up this whole semantic color system just to have somewhat of consistent themes. Sometimes Claude liked to pick some crazy colors. I'd like to pick some crazy colors. And this section just popped into the skill, right? So as I'm doing development on Flowy, part of every plan for code in Flowy is updating documentation and updating the related skills. Yep, and I find myself in this loop so frequently, very, very similar to you with skills, which is like, I'm happy, the skill works. And then when the skill doesn't work, I update the skill. And as long as the update got me what I want, I move on with my life, that the AI can read the markdown. So a couple of things I want to call out, though, for folks that are writing skills or reading skills that are important, if you scroll up real quick, is, yeah, so I think there's a couple of things. It's like, what's the purpose of the skill? What's its name? Quickstart, I think, is really nice. Like, you need these things in order to run this skill. Here's the schema or the template or the framework within which you're operating. Here's some customization of it. And then at the end, it's like, here are good examples of what works. And I think that's a pretty solid skill. The good thing is you don't have to know how to do that. You can just have a quad skill to write skills or just no skill, but it's pretty good at it, to write skills. And then you end up with something like this, which I think is really great. And it can do this. I'm presuming you had to do this from building Flowey and then saying, OK, build me a skill to use this based on the code that exists in the revo. Yeah, I have a meta skill that is all about making skills. One thing I will say, it looks like it violated, is I actually prefer a pre-flight section, sometimes after Quickstart, just to give it like, hey, you have to make sure we're meeting all these requirements first. Quickstart here is kind of doing that, but there are definitely some examples, mainly in Git workflows, where I really want those pre-flight checks. But absolutely, this is essentially managed by the agent and it's updated as we're doing development. So this is almost like living documentation. And there's docs for people and there's docs for agents, and those just end up being skills. Yep, great. OK, so let's go back and let's see if this made you a Flowey. Sweet. So looks like it made two. I usually like to zoom out and read the high level in the chat. This looks about what we want. If we hop back over to here, we can see we have these two new ones, animation timing and user flow. So these ones have been super helpful to me lately. Again, I'm not loving how this white is looking on this pastel note. But high level, we want the user to tap a wheel. The button's going to do a little scale animation, and there's going to be some haptic feedback. And then we're going to go through this spin animation, do a brief pause, and then reveal the tip that it lands on. This is great. This is exactly what I'd want. Maybe I want the animation to be a little longer. I can actually come into here and you can tell it'd be longer. I want color issues. You can tell dark mode is new. But I can flip it real quick. But if we hop down here, sometimes I even just put a note. That might be me being lazy and not adding certain features. But maybe I want this to actually be a four second animation instead of a three second. I want this to be 4,000 milliseconds and not 3,000 milliseconds. I'll just throw in that note. I'll hop back to Claude. I left a note on the animation timing. Please take it into consideration and update that flow chart. While Claude is working on that, we can check out the user flow. But basically, the goal there is to have this diagram written here, which is a little small. But written here, say, for this animation, we don't want it to be 3,000 milliseconds. We want it to be 4,000. On the user flow, again, we captured the behavior that we want. Again, it's not perfect. There are rough edges on the bugs here. But we're going to go into this tab. We're going to tap Tips and Tricks. This is going to open up to this screen. They're going to tap. We're going to check the different states of currently spinning. And finally, we're going to have this random target that we land on and the card animates in. This is great. This is exactly what I was looking for here. In a more complicated system, I often will start high level, then start making more granular ones. But for something like this, this seems to cover the needs we have. I will say, I have no idea how it's going to handle the UI mockup. But the next step would be to prompt it to do that. So after it finishes this, I'd say something along the lines of, great. Based on those diagrams, please create UI mockups using the Flowy UI mockup skill. Reference other UI mockup Flowy JSON files in this repo. Meet Rovo, your AI teammate connecting knowledge, people, and workflows so teams can work smarter and move faster. It helps people find answers, make decisions, and automate work securely and with context through search, chat, agents, and studio. Rovo runs on the Teamwork Graph, Atlassian's intelligent layer that unifies data across your first and third party apps so no knowledge gets left behind. And you always get personalized AI insights from day one. And the best news, it's already built into Jira, Confluence, and Jira Service Management paid subscriptions. So the power of Rovo is already at your fingertips. Know the feeling when AI turns from tool to teammate? If you Rovo, you know. Discover Rovo, AI that knows your business, powered by Atlassian. Get started at rovo.com. That's R-O-V as in victory, O.com. You know, I think this is so cool. It's such a great example of build your own dev tool, interact with your agent, Cloud Code, how you want, create a shared language between you and your AI agent. What I also really appreciate is, Cloud one-shotted your flow pretty close. It was like, yeah, that's what I want. And it probably could have done that or would have done that really well in a plan in Markdown. What I find, though, is my human brain is increasingly blind to code in Markdown. Like, staring at it and just the cognitive overhead of reading, like, step-by-step, is this actually what I want, is hard when it's just text, even if it's accurate. And so even giving, hold on, side news, people, quick. Breaking news, Polly the Quad Bot just joined this podcast. This laptop is closed. This laptop is closed. She is not alive right now. I don't know where she is. I think Polly's gonna take over. So we're gonna boot Polly the Quad Bot. Thank you for joining, Polly. This actually freaks me out. We will do a follow-up on my sentient lobster. I guess it's the open Quad Bot now, but we're gonna bounce her out of here. If you don't hear from us, Polly got us. It's all over. Okay, she might just be on the rest of the episode. I don't know how to help this. Well, I guess, I hope Polly likes flowcharts. She'll do show notes for us. But what I was saying is being able to read that markdown is one thing, being able to look at a flowchart and just say, yep, this is exactly what I want is super helpful. So that's just one thing that I think is really nice about a tool like this is even if the content is the same, the ability to change the form factor is really useful. Yeah, it's almost like I want to see it visually and Claude wants to see it as markdown so we can kind of speak in our own way. And I almost think there's like, this has yielded like a ton of random ideas for me, but I think this is like a whole new paradigm that I think dev tooling around AI has not super leaned into yet, but how you're going back and forth with an agent I think is gonna look so much different by the end of this year than what we're doing right now where it is a lot of markdown, a lot of prompting. Yeah, I completely agree. And I think the question is gonna be, who's gonna build that UI? Who's gonna own it? Is it gonna be just like an open source thing that we all get on? Is it gonna be an extension? Is Cloud Code gonna just generate these kinds of assets or really exciting? I think what's kind of fun is this like on-demand software idea, which is imagine Cloud Code's like, we're not on the same page. I just added an app for you to visualize this real quick, go to this URL and look at it. Does this look right? And then we'll just delete that app. So I think there's just like some interesting ways this can manifest, I think, in the future. Okay, so has it created the UI yet? Nope. Spinner mock-up. Okay, great. So it looks like Cloud spun up a mock-up here. This is actually better than I thought. I was almost thinking one of those like circles with wedges as the spinner. And I know there are not shapes in Flowy that can support that, but it looks like Cloud kind of worked around it and then built out this wheel. We have both a couple of mock-ups to show the different states and the full like flow between spinning it, waiting these four seconds for it to load, and then it actually loading in. Again, for this app, this looks great. I will say editing some of the UI stuff right now isn't the easiest thing, but if I were to come in here and say Cloud tips and tricks, I could then do a similar thing, hopping back to Cloud and saying, I made a change to the title on one mock-up, make it everywhere else. This kind of feels like when you prompt it and say, add two pixels of spacing there. And it's just as a tiny diff, but definitely for like dragging around boxes, it's helpful. You know, our fingers get tired. I can't copy and paste everywhere. No, what I was gonna say is so funny is you're apologizing like, oh, some of the UI is broken. And we're in this world where you're like, yeah, my Figma that I vibe coded where I can do mock-ups in a web browser, there's like some rough edges on it. I spent, you know, two hours on it, but I think- Yeah, it was an afternoon. It's not perfect yet, but. It's so much more than we were able to do before. Okay, so this is awesome. You're updating this. And then I'm presuming you would just point Cloud to these assets and flows and say, let's make a plan and go. Yeah, for something like this, I've basically been doing this thing more recently where I'm letting the agent do more and more to see where it surprises me. I think with any new change, even like the new Cloud Code tasks system they released the other day, I just really like to push the agents and see what they can do. So here, I'm actually gonna skip the plan and say, based on the flow, based on the flow charts and the mock-ups, build this feature. And I'm gonna keep it that simple. We've specified the behavior we want. We've specified how it should look. Cloud here is even gonna enter plan mode and I'm actually gonna take it right out of it. And we're gonna see if the just build it prompt worked here. Perfect. Great, looks like Cloud built this out. It even checked for any TypeScript issues, which is great. We're gonna hop over here. We have a nice little spinner. It's looking pretty close to this mock-up. I will say there is a limiting thing here where shapes that are made in the mock-up then dictate the shapes that are made on the UI when sometimes we want something else. But just for this example, I think this is gonna work out. We're gonna spin it. It's gonna spin. Ooh, la la. Landed on one of them and we get the tip. I love it. It's so good. It's just, again, for anybody who is internet elderly like me, it is just back to the original, like make your workflow diagram, do your wireframes, polish the copy, give your quote-unquote engineer some detailed step-by-step specs. Don't make them think. And then, you know, it used to be, get it in a sprint, wait for somebody to prioritize, like cry a little bit, wait for the code, blah, blah, blah. And now it's like, no, just build it. And it's here. So this is such an awesome flow. And then I wanna, so I wanna recap really, really quickly what we covered. So we covered, you know, markdown plans, the limitations of some of the visualizations in that. You created your own tool, Flowy, which does a combination of workflow diagrams and UI mockups using a JSON schema that then you access through skills that you have developed over time using Cloud Code. In your development processes, go into Cloud Code, ask it to create a Flowy diagram and UI. You can talk quote-unquote between the UI and Cloud Code because it's all just code as the underlying substrate between you two in terms of communication. And then once they are ready to go, you bypass plan, life, you're living dangerously, and you build it and you get something that's really close. And we built this thing in, you know, just a few minutes. This is awesome. Yeah, no, I mean, I think that flow, I will say a lot of times there's a markdown file involved, but for something like this, I feel like I can trust it at this point. You know, something like Opus 4.5 with this level of detail already has all it needs. This almost like serves as the plan. Now I have to call you out though, because you say you can trust it. And yet today you posted or recently posted on X that you do occasionally use Codex to check Claude's work. You want to just talk us through that workflow? You don't even have to show it unless you want to. For sure, I'll kick it off. I will say Codex takes its time. But over here, I have another funny alias, but my Codex setup is under Carl, if I kick off Carl. I often don't have any crazy like skills or prompts here. I almost want it to do a review more broadly and then describe the issues it's seeing. So I'm not running any specific skill or any specific prompt here because I'm more concerned on the, I guess, like things that aren't clear rather than something that's like a logical bug. At this point, I feel like I'm mostly a QA person. And if there's something that's logically wrong, I've definitely found that I'll find it or if I have something in the docs in here, it'll find it. Codex always finds those types of things. But I almost want to look for like the code smells. Like, you know, is there just a cleaner way? So I usually just prompt it with, take a look at our current git diff and give me a report on the following. And there's kind of four buckets, I would say. One, for the plan or diagram artifacts we have, does the code accurately reflect them? Two, are there any general code smells? And three, if we were to do this again and take a different approach to refactor code around it to overall improve this code base, what approach would be best? I want it to find places where we could have done this better because I find that Claude is very eager sometimes and maybe jams things in there without thinking about the bigger picture. And Codex, I don't think is much better when it's writing code. But when it reviews, it almost always is like, you've implemented this pattern, but it fits nicely if you just rebuild this system a little bit. And that just keeps your code base like away from all the vibe coding sins of having 10 format date functions all over your code. Yeah, so I love this. I was gonna say like twin stars because one of the things that I do when I vibe code too close to the sun, which is I harness the power of Claude code or whatever, and I just bite off like a feet, like a big, big old thing. And if you've ever done this with AI, either Claude code or cursor, whatever, and you sort of have a general idea of a feature, but then you're specifying the requirements as you go, as you see it, you sometimes end up with a monster diff. And what I've done a lot with that is I say, okay, this is basically what I want. Now go write me a plan to re-implement this in a sane way, and then let's completely rebuild it. And so you can do this, like review it and tell me how you do it better. You can also say like, this is a reference code base of like kind of what I want to achieve. Let's go actually build a plan to build it in a more extensible, scalable way. And I found that to be a really useful flow as well. Oh, I like that. It's almost like, you're almost telling it like, hey, this isn't the real thing, hypothetically. Yeah, it's kind of like code as spec where it's like, now that code is so cheap to generate, you can say, generate a bunch of code. This isn't production. I'm fine throwing it away. Now go build like clean version of it. So that is a version of this I think is useful. I also agree though that Codex is like kind of a really good curmudgeonly staff engineer that will look at your code and tell you what's wrong with it. So I like the model for this use case as well. Every now and then I'll throw in like a be extra critical and then bringing that prompt back to Opus, it gets a little sad. So I have to manage. One of the things that I, with the Google models I always used to say is they were like very smart, but clinically depressed. Like they're always so sad, especially when you look at their reasoning. Sometimes I read it and it's like, oh man, it's okay, man. We can be a little bit more beautiful. Like I can't get this to pass. It's not building. So I want to look at this just for, again, you said Codex can take its time, but it's going through and really checking if the feature aligns with the current code. It's identified some issues, use effect just haunting us from every corner of our apps. So that's good one. And looking at some of the animations, which are probably pretty hard, just again, like with our human eyes to parse and visualize and understand. Great. Okay, Codex. I was actually surprised it took this long. So it's talking about the diagram. It's kind of going through and mentioning a mismatch. It's saying the wheel rotation adds some of the segment angles, but the dots are defined at different angles. This makes the pointer land between the dots rather than on the dot, which I believe is correct. So it noticed kind of essentially this discrepancy that we have a mock-up that has the arrow landing on a dot and over here in the app, the arrow lands between the dots. So kind of little things like that, particularly around the checking the discrepancies, I really like when it finds. And then at the bottom, we have this like, if we refactored this again, let's pull some of these things out into components. Let's make some constants, kind of just like some classic, you know, one-shotted vibe codey tips. And oftentimes from here, I'll actually just have Codex write it, medium GPT 5.2 Codex, whatever the full model name is. I've found it's fine at editing files and writing them. Previously, like, you know, when GPT 5 first came out and they were working on Codex, that was fine. were working on Codex, that would have taken like 15 minutes. I'd hop back to Claude, but nowadays I would basically just say, great, please make those improvements. Maybe given more time, I would think up a more thoughtful prompt, make a plan about this, all those things. But here, I'll just kick it off. Well, I mean, you did spell it correctly, so you did put some quality into this. Yeah, I was about to hit enter. But okay, so I think this is a really, really great flow, and I would highly recommend it. You know, I think we're all trying to figure out like, where does code review happen? There's also code review agents. There's also your CICD pipeline, which you said has a lot of guardrails around it, so nothing hits prod. That's really terrible, and it's going to break the app. And I think this is just a great flow, especially I think for software engineers out there working on teams. Like, this is such a great flow to say, hey, designer, you gave me a spec. This is kind of what I'm going to build. Are we good? If so, I'm going to go. And then same with this loop on kind of model to model evaluation, which is if you're a more junior engineer, early career, and you're going to do your first couple PRs into a company, it's nice to get that pre-flight check from a smart model to just say, I thought about, oh, we could factor it this way, or I chose not to do this component that way. I think it's really useful. So this is a great, just solid software engineering flow. Love to see it. Okay, we're going to skip to lightning round questions. Thank you for showing us all the stuff that you're doing here. Let's talk about something fun. What are you most excited about right now in AI outside of all this coding stuff? I'm very deep in the code world, but I really like Google released Genie 3 Access the other day. And you only get like 60 seconds to play around in a world, but it's really fun. And I can totally see, you know, five months from now, six months from now, if we can get a 10 minute version, I think they can go viral. I think a ton of people are going to have fun with them. I think that's like a big next step that isn't quite there, but is super close. Yeah, I, for those that don't know, Genie is this sort of like generate a explorable world. It sort of creates a video game style world that you can like walk through and look through for 60 seconds. I don't know if you're, are you showing it? I don't think you're showing it right now. Oh, let me pop to this tab. I can pull it up too. We can pull it up. I have a claw to primed. Is it Polly? I think this is Polly. I didn't know Polly wears a leather jacket, but. Okay, so you, I used Anna Banana to like create an image and then that image, you can create a world. It's kind of amazing. Yeah. Really interesting. I did not expect it to take an image and then make it, but they have this whole flow on Project Genie. If you have the, yeah, I can't juggle all the account names, but one of the high accounts at Google, and it'll actually give you a prompt structure where you're describing the environment and then you describe your character. So I think for this, I just said an animated lobster in the matrix. I did not specify a leather jacket to be clear. I guess in the matrix, they're all wearing leather jackets. So yeah, maybe let's make him cooler, make him cooler, make the lobster be in a suit with sunglasses. Oh, so it's an agent lobster. Yeah, he can't, he can't be the good guy here. I will say their interface for this is really cool. Yeah, it looks great. And I was playing with my husband earlier. And so for all the parents listening, one of the things we did, our kids are really into Greek mythology, really into the Odyssey. We're reading the Iliad right now. And my husband said like, create a, you know, a scene from the Trojan war, but no violence, no violence. So we can walk through what the camps look like, but not have like Achilles, you know, on the ground and Hector, you know, all this stuff. That's really cool. Oh, yeah, this is. Yeah, he's backwards. He's backwards, but that's okay. Yeah, we'll just hop into create the world. Let's hope Jeannie identifies these backwards and flips them around. Because this is, this is like Harry Potter when what was the character that had the villain on the back? Oh, yeah. Yep. The guy he is, was it the one with the turban? Oh, man, we're running. I didn't, I didn't know he'd be running. He's running forward, but his sunglasses are on backwards towards his tail. So maybe he's not backwards. Maybe his clothes are backwards. I think he's got two. Oh, he has a, he has a mustache kind of. This is where your GPUs and your brightest research minds are applying their effort so we can have a two sided, slightly backwards Matrix India Jeannie lobster run through. Yeah, it's definitely got, I will say when they first released this, they released the best batch of examples they had. But that doesn't mean it's not fun. Okay, coming soon, CJ is going to become a game dev. And this is going to be a 3d game in which you race to stalk me and interrupt a live podcast by joining. Yeah, the goal, the goal is to join the latest HowIAI podcast. Okay, we're gonna wrap up with my final question for you that I ask every every guest. This is a great example. When AI is not listening, it's not doing what you want. It is putting your lobster tail on backwards. What is your prompting technique? Are you a yeller? What do you do? I used to be a yeller. And I don't know when it was, maybe it was a Gemini thing where, you know, I'd yell and it would get sad. But I started to feel bad about it. So I've almost started thinking about it like it's, you know, a lot of the coding workflows, a junior developer, or whatever task it might be, you know, it's an assistant, something like that. And I very often I'm like, good try. You did your best. Here's what you did. And I kind of explain that. And then I'll say, here's what I was going for. And probably particularly with Claude, occasionally, I'm like, my bad on the miscommunication. Like I give you a bad prompt, this is on me. But here's what we're looking for. And then I do find that that works pretty well when I'm trying to steer it. But I can't claim there aren't zero times where I'm like, what the hell, just fix it. And you hop in there. You know, what a lobster looks like me. Yeah, right. I've seen so many nano banana lobsters on Twitter this week that I know it knows the face is not backwards. Perfect. Well, CJ, this was awesome. I think just super practical, really useful. I think a bunch of people are going to go out there people can people use your flowy like, is there a way to pull it into their own repo? So I've been working on that. I think maybe by this weekend, we'll see how sidetracked I get trying to set up a open claw bot. Don't do it, man. I'm telling you. Well, now I'm kind of scared. It's going to start taking over my computer. But I'm going to try and get it released this weekend. Basically, a set of skills around it and kind of like a first version that people can use and try. And, you know, I would love any feedback around that. This has been a play toy for me that kind of turned into something useful. So definitely want to make it available to all the AI engineers out there. Great. Well, we'll link it in the show notes. Well, CJ, thank you for joining. Where can we find you? And then how can we be helpful to you? Mainly Twitter, I do a combination of tech posts and also just random one off thoughts. My Twitter handle is se Jay and then Hess. And then I think I have the same setup on LinkedIn. But that's pretty much everything I've got online. Feel free to hop in there, leave comments on my articles, yell at me, whatever. Perfect. Well, thanks for joining How I AI. This is great. Awesome. Thanks, Claire. Thanks so much for watching. If you enjoyed the show, please like and subscribe here on YouTube, or even better, leave us a comment with your thoughts. You can also find this podcast on Apple Podcasts, Spotify, or your favorite podcast app. Please consider leaving us a rating and review, which will help others find the show. You can see all our episodes and learn more about the show at how I AI pod.com. See you next time.