Code with Claude: The 5 biggest updates explained

Overview

Claire Vo gives a quick field report from Anthropic's "Code with Claude" event and focuses on five releases tied to Claude Code and Claude managed agents. The episode is mostly about new building blocks for agent workflows: scheduled routines, outcome-based agents, multi-agent teams, memory tools, and higher usage limits.

Key Takeaways

The clearest product shift is that Claude is moving from "chat that helps with tasks" toward software that runs work on its own. The new routines feature in Claude Code is a simple example: you can schedule jobs on a cron, fire them from HTTP, or trigger them from GitHub webhooks, then run them either locally or in the cloud. Claire's newsletter example makes the point well: a task she used to start manually every Monday can now run on its own against a project folder.

The managed agents API adds a stronger definition of "done" through outcomes. Claire compares it to OpenAI's goal feature in Codex. The idea is that you give the agent a rubric, usually in markdown, and a grader checks whether the result meets that standard. She says the system can iterate up to 20 times. That matters because it turns an agent from a one-shot responder into something closer to a looped worker that can revise its own output until it passes.

Another release points to where agent products are heading: explicit multi-agent teams. Claire says developers can define an orchestrator plus delegate agents, with up to 25 agents sharing the same container and file system. Each agent can get its own tools. Her example is a PRD workflow with one orchestrator, a strategy agent, a critic agent, and an engineering review agent. The useful part here is not the novelty of "many agents," but the ability to assign roles and tool access in a structured way.

The "dreams" feature is Anthropic's take on agent memory. Claire strips away the hype and describes memory as markdown files written to the agent's file system so future sessions can use what was learned. What stands out is when the memory gets written. Instead of only saving memory at the end of a session or on a fixed hook, dreams can review a batch of past sessions and decide what should stick. Claire also raises the missing half of the problem: forgetting. Better memory without a way to discard stale or bad information can become baggage.

The most practical announcement for many users may be the least glamorous one: higher usage limits. Claire says Claude Code's five-hour limits are now doubled across several plans, peak-hour restrictions are going away for Pro and Max, and Opus API rate limits are increasing.

Practical Steps

Audit the repetitive work you still start by hand. If it happens on a schedule, turn it into a Claude Code routine.
Start with one contained use case, such as:
- weekly newsletter drafts from a changelog
- PRD rubric checks on recently edited docs
- GitHub-triggered reviews when a pull request opens
For managed agents, write the rubric before the prompt. Be explicit about what success looks like, what to exclude, and how the output will be judged.
If you're building agent products, split roles instead of stuffing everything into one agent. Give each sub-agent a narrow job and only the tools it needs.
Treat memory as a system you manage, not a magic feature. Decide when past sessions should be reviewed, what gets saved, and what should eventually be dropped.
Revisit usage assumptions. If Claude's limits were the reason you held back on longer-running workflows, test those automations again.

Notable Quotes

"You define what done looks like for an agent."
"Memory is basically the idea of writing markdown files to the file system your agent uses."
"I think we think a lot about agent memory, but not a lot about agent forgetting."

Define a rubric, give the agent the task, let it bang its head against that at least 20 times till it gets it right. — From the episode

Full Transcript

Source: openai 11m runtime

Welcome back to How I AI. I'm Claire Vo, product leader and AI obsessive here on a mission to help you build better with these new tools. Today, I attended Code with Claude, Anthropic's first developer event, and they announced some things in Claude code and Claude managed agents I think you want to know about. I'm going to walk you through five things that launched today, how they work, what they are, and what I might build with them. We're going to keep it under 10 minutes, and this is going to be a quick preview of what you'll see in your new Claude code and Claude API products experience. Okay, the first thing that shipped at Code with Claude that I think you want to know about, there are some updates to the Claude code app. And one of the updates that I know we've all been waiting for is routines, the ability to trigger events or actions on a schedule. We love it. You know, I love OpenCLAW, and what I love about OpenCLAW are the crons. And so now Claude code has that built in right here in the app. All you have to do is click new routine. You can either run it locally or remote. I'm going to run it locally, and I'm going to say weekly newsletter. This is something that I haven't been doing. I'm sorry, if you're a ChatPardy customer, I know I haven't done my newsletter. I'm going to say, look at our change log and draft a newsletter for us to send weekly. And then I'm going to go in here and just say we have a change log.md in the docs folder. Review it every Monday and write a customer facing newsletter based on the best customer facing features we shipped. Don't talk about behind the scenes things like tech debt or security unless they really impress customers. OK, so I'm going to do that. I'm going to run it weekly on Mondays at 6am. And I think that's all I need to do. Oh, I'm going to select my folder where my project is and then I'm going to create create. And now Claudecode will run my newsletter draft cron every week and then I can come back in here, grab the HTML. If I were being really fancy, I would hook this up to my newsletter platform. I would hook it up to my Slack and Pingus, but again, this is very useful to me. This is something that I used to kick off manually in Claudecode every Monday and now I can do that here in Claudecode on a schedule. So how does that work? There are three trigger types. You can trigger them on a cron, which is a schedule, HTTP or a GitHub webhook. So you can do sort of a normal webhook or GitHub webhook. You can trigger these three ways, scheduled like I just did, off of a GitHub action or a general webhook. So you can hook it up to other systems to kick off a routine. All the stuff in connectors come along. So I have Slack connected. I have GitHub connected. So you can use those things as part of your routine and it can run in the cloud or it can run on your laptop like I showed. And this is an example of a use case where you could say weekly, I want you to check every PRD modified this week and check if it matches our rubric and post a summary to the team channel. So that's item one. The second one is in cloud managed agents in the API. If you haven't paid attention, OpenAI released something in Codex called goal. You can do slash goal in beta in Codex and it'll basically bang its head against the problem, do what's called a route loop against the problem until it actually hits the goal. Anthropic released a very similar product called outcomes where you can define what done looks like for cloud code or for an agent. And then the agent will then Anthropic release something very similar in the cloud API called outcomes. You define what done looks like for an agent. It can self grade and iterate until it gets there. There's a couple interesting things you need to know about how outcomes are defined. They all anchor on what's called a rubric. So there's a markdown file that's uploaded either through the files API or inline, and it's going to tell your agent what success looks like. Then there is a grater and it can do up to 20 iterations on the task to get to the outcome that you're going for. I want to walk through this one in a very specific example to make this just a little bit more concrete for folks. So, so imagine that I want you to ship a ship ready PRD and I don't know if you can relate to this. Often you go through feedback cycles. You have to check it against priorities. You have to check it against technical capabilities. Now using a cloud managed agent, you could in theory, write a rubric, which is what does a good ship ready PRD look like? And then the agent can just take your PRD or your idea and iterate over and over and over again until it's fixed. Of course, you could expect, of course you could expose this to your customers in an app like I might do for chat PRD, but again, this idea of outcome is define a rubric, give the agent the task, let it bang its head against that at least 20 times till it gets it right. I think this is a really interesting model for agentic products and something I suspect many of us will use. The second thing I really love is a multi-agent framework supported in cloud managed agents. So now you can, through the API explicitly define a multi-agent team that's going to work against the same container, the same file system up to 25, which is kind of amazing. You can have a orchestrator and then delegates. And so there's explicit hierarchy and each agent can have its own tool set. I think this is really cool because now you're able to define not just individual agents, but teams of agents programmatically through the API. And so the example I would give for something like chat PRD is you could have a PRD orchestrator. This is sort of like the master agent that is intended to define and drive the work across the team. And then you can have three pieces or three sub-agents, a strategy agent that reflects the CPO voice, the critic agent that sort of like supposed to poke at the holes in the PRD. I like being the critic agent and then end review that could maybe have access to something like GitHub to optimize the technical implementation of the PRD. And so you can define this as you see over here in the API, you define an agent in the API. You give it a orchestrator level set of tools, and then you can define the sub-agents in the API with their own set of tools. And then you can expose that as you can see here on the right as three agents all working in parallel against the same problem owned by the coordinator or orchestrator level. Again, I think this is an interesting enhancement on the primitives of agents that people are going to be using quite a bit. Okay. The next one I really like, it is dreams. So this is all about agent memories. Just to make it simple for folks, memory is basically the idea of writing markdown files to the file system your agent uses. That helps it do a better job the next time. It's not that fancy. Often those files have a date on them, but you don't really have to overthink it. But creating those memories is a little hard. And often a lot of the harnesses right now write memory on a hook. They write them on an event. And so what they do is like when you close the session, it writes memory or when something happens, write memory. Or like with open claw, you can explicitly tell it to write memory. But what I like about dreams, which is a very funny brand for an agentic memory product, but we'll allow it is it's a primitive to call against a list of agent sessions. So let's say you've done 50 things with your agent. It's an explicit call to take those 50 sessions, review them, and then come up with important memories to write to disk. And as I'm saying this, I guess this is what we do when we dream. We go through our day, we review it silently, and then we decide what to commit to memory. I don't know. I don't know if this is the perfect metaphor, but it's the one we got and it looks great on a, on a branded website. This one's in research preview, so I don't think everybody has access to this through API. I certainly don't have access to it. So I'm looking forward to touching it. But why I think that this one's important to know is it just gives you a frame of reference for how anthropic and these labs are thinking about the primitives again of agents and agent memory. And you can predict that some framework like this is going to be integrated into agentic platforms or agentic products where on some action or some regular cadence, you're going to review past sessions and you're going to explicitly write the right things to disk so they can be referred to moving forward. Side note, I think we think a lot about agent memory, but not a lot about agent forgetting. So I'm looking forward to like the purge version of this, which is dreams that tell you what to forget. I don't know if that's like trauma erasure or whatever, but I think there's something interesting here. Okay. And then number five, the only announcement people really care about, which is usage limits are up. So starting today, Claude code's five hour limits are now doubled across pro, max, team, and seat based enterprise platforms. Peak hours are going away for pro and max plans, and the rate limits for Opus models in the API are going up. So we can all use these products more. Again, what did we see today at code with Claude? Lots of other stuff. They might put data centers in space. There