Let’s Talk Agentic Development: Spotify x Anthropic Live

Overview

This episode explores how rapidly improving AI coding agents are changing software development at Anthropic and Spotify. The conversation centers on Claude’s step-change in capability, the rise of MCP (Model Context Protocol) as a standard for connecting models to tools, and how organizations are reworking developer workflows, context management, and internal platforms to adapt.

A major theme is that AI is no longer just assisting with small coding tasks; it is increasingly capable of handling long-running, multi-step work with limited supervision. That shift is creating new opportunities—faster prototyping, broader access to software creation, and more experimentation—while also exposing new bottlenecks in CI, code review, governance, and production safety.

Key Takeaways

One of the clearest insights is that model progress feels gradual from the inside until workflows suddenly change. Anthropic’s speakers describe a moment when developers moved from using AI alongside IDEs to running multiple terminal-based agents in parallel, trusting them to complete substantial tasks and reviewing mainly the final output. Spotify confirms the same pattern, calling a Claude release a visible inflection point in internal usage.

On MCP, the discussion highlights that its rapid adoption came from timing as much as design. Models had just become good enough at tool use, developers urgently needed a standardized way to connect systems, and MCP borrowed wisely from proven protocol ideas like the Language Server Protocol. At the same time, the speakers are candid that fast adoption creates real challenges around authentication, identity, enterprise controls, and hype-driven expectations.

Another important takeaway is that “context engineering” is becoming less about elaborate handcrafted instructions and more about good defaults, strong tooling, and reproducible environments. Both companies have found value in Claude MD files, skills, and structured repository context, but they also note that over-constraining models can become counterproductive as capabilities improve. The more durable advantage appears to come from ergonomic tools, validation loops, semantic search, and standardized setups.

The episode also surfaces a counterintuitive organizational lesson: as code generation becomes cheaper, prototyping becomes the default form of argument. Rather than debating ideas abstractly, teams can quickly build and test them. This has implications beyond engineering—non-developers such as PMs and designers can increasingly trigger useful software work without needing full local development environments.

Practical Steps

For teams adopting AI agents more seriously, a few actions stand out:

Standardize the basics first: create strong default context files, shared skills, and consistent tool access across repositories.
Prefer tool ergonomics over excessive prompt micromanagement. Invest in reliable CLIs, MCP servers, search, testing, and validation loops.
Separate use cases by environment:
- Use CLI tools for local developer workflows where users can handle interruptions or auth prompts.
- Use protocols like MCP for enterprise or remote integrations where centralized auth, guardrails, and gateways matter.
Strengthen automated feedback:
- Add more static analysis, linting, and deterministic checks.
- Expand CI verification so agents get fast, machine-readable signals.
Adjust governance to risk:
- Keep humans tightly involved for critical systems.
- Use feature flags and staged rollouts for lower-risk product work.
Design for non-developers by reducing friction. In some cases, the right output is not a pull request but a runnable prototype, report, or app build.
Watch for new bottlenecks. If AI increases code throughput, be ready to invest in CI capacity, review workflows, and clearer ownership.

Notable Quotes

“Code wins arguments, but now that code becomes very cheap, it becomes very easy to make that argument.” — Speaker 3

“I feel I’m strapped onto a rocket and just into the ride at this point.” — Speaker 3

“Let Claude cook.” — Speaker 2

Full Transcript

Source: manual 0m runtime

Speaker 1

Welcome. Thank you. I'm gonna let you introduce yourself. Do you want to go first? Sure, I'll go for it.

Speaker 2

Hey everyone. My name is Christian. I lead up our portal engineers on the PlayDood team at Anthropic. Joined a year ago with the team of LSUP10 in London. I'm super excited to be here. Okay.

Speaker 3

Yeah, I'm David. Uh I'm the co-creator of MCP. Before uh since two years at Anthropic before that, I spent ten years at Facebook working mostly on internal developer tools.

Speaker 1

Cool. So we're gonna have a bit of discussion on what we're doing in this space, hopefully a bit back and forth in between what we're doing at the Tropic and what we're doing at Spotify. So let's kick things off. Back in November, uh I think November 24th, you released Opus 4. 5. For us that was uh a real moment to be honest. Like if you look at Spotify's use of AI in our develop among our developers, you can actually see that date literally on the on the charts that we have So tell me, how was that from the inside? Like what did that look like? Did you have a an idea what it was going to look like before it happened? And how was that, I imagine, crazy ride through those through that release?

Speaker 2

Yeah, this is a fun one. I feel like we're gonna probably have different different answers here. I'll give a quick start from my end. Um the fun part about the Plate IT just explained as well is that we We get to both work with some of our C customers on what they build, but also a lot of it is like dog fooding our internal tools and what we do. Um I remember around sort of end of November last year A big part of of what we're doing was was supporting uh cloud code. Um and could use cloud code, you know, there's a bunch of tools that cloud has access to. Um and one of the things we realized when we were like playing around with early snapshots of what we came up with 4. 5 is that Some of the tooling basically was a little bit too stubborn for Cloud to actually enjoy. So one big thing I realized is that Cloud loves bash and was using a bunch of bash commands instead of necessarily using the tools that we have in the claw code UI. And this is like a sort of simple prompting exercise, right? You can sort of try to get Claude to get out of that by by by prompting it. differently, giving different scaffolding, etc. And this is quite a stubborn exercise because we went through many iterations where different snapshots, different iterations. And eventually we sort of was able to get OpenStore 5. 2 to comply play nicely with a tooling from no code. So my experience was like, ah, this tool, like this model is a bit It's a bit stubborn, it's a bit stiff. Uh is it really that much of an uplift? And then I remember chatting to some of my colleagues in the PlayAt team in the US. Um, they were also doing some bashing but completely differently. shared a video with me where it was basically the claw. ai uh interface uh with an artifact. And I was like looking at this and I was like, this isn't really that impressive. It's an artifact. We had that before. It was nicely done of course And then I remember in the Slack thread, people were like commenting, but was so impressed about this. And eventually got to the point where Claud had one-shotted the whole Claudia AI interface, including the artifacts. functionality from scratch uh without any extra prompting, any extra hand holding. And the whole sort of demo itself was clawed that generated all this including the artifact functionality. So what I thought was trivial was actually everything from scratch with OpenSport 5. And it was really like quite a step change when I realized that that, you know, this little simple part where I was fighting with the edges, you know, you get blind to how good these models get when you take it out of your everyday day-to-day domain and really look at it from a more holistic perspective. And that's when I realized like, okay, this thing is actually quite quite it has a lot of punch, you can really run for a long time. Because to build this whole Cloud AI clone took several hours for Claud to sort of iterate through, build the RDR functionality, the whole sort of login, etc. And that's when I realized okay this is something to really take super seriously.

Speaker 3

I think for me, yeah, um it's hard to observe like the one specific moment because you're testing different models all the time. But I think it's more of a vibe when you go into the office one week and people sit in front like a VS Code or Z editor and on the side they have cloud code. And then suddenly something shifts because the model comes out. You have access to this model. We at the time get access to these like the fast version of it as what. And two weeks later you find people sit in front of a you know terminal, seven panes open, and just cloud code running. And what really happened, if you really think about what happened, is that before that You needed to really hold model eventually and like interject at the right moment to steer it back on track when it comes to longer, more complicated tasks But it was clear that there was a step change that suddenly you can run six, seven, eight, ten cloths in parallel on different work trees um and trust the output for the most part and really go back to only reviewing the final result and not any of the intermediate steps. And I think that has really been the shift for me of going into the office one week. seeing people in front of an IDE, coming back three weeks later and seeing everyone in front of in terminals only.

Speaker 1

I I have to tell a story on that on that exact theme. So in I think September I was in your office in San Francisco met with Boris, tech lead for Claude. I know that's his actual title. And in that meeting he said, by the end of the year, I don't think anyone at in our team is gonna use an ID anymore And I didn't say this out loud, but in my head I was thinking like you're just insane. Like that's that just seemed like a completely I I could imagine that on a maybe like a two-year time frame. And then exactly what in December I found myself, like I haven't used an ID now in weeks. So very distinct moment in time, in a very surprising way. All right. Uh David, you as you mentioned been very involved in MCP.

Speaker 4

Yeah.

Speaker 1

MCP being this protocol for connecting AIs to various tools. Is that a good one-line summary? Yeah, I think so, yeah. What's and that's has gotten a ridiculous amount of adoption in a very short period of time. What was the what was the things that you got right with MCP that made it get that adoption? And what's the things that you that keeps you up at night today about MCP?

Speaker 3

I think there's when you make a protocol, I think a lot of things need to fall into place to make it really um widely adopted. I think one aspect is of course it comes from one of the big laps, right? And it I think that's important. But I think it also is it was at the at the right moment in time where people We're starting to understand the power of models at the time like Sonnet 3. 5 was a really, really popular model that got increasingly good. at using tools, which we now know is really the fundamental basis of what makes an agent. But you know at the time you like you look at these applications and you have no real way to connect anything to it And so giving really the realization that on one hand side you are at the right spots in the industry to give people the tools into their hand. to build for themselves. I think that's the really critical moment. And then you go and look for um similar inspirations um and and you steal from from the best. And in this case we st we stole a lot from uh LSP, so the language server protocol. Um and you build with it and you're and I think that um these two things combined on the one hand side like being in the right spot um doing the right thing for developers and trusting the ecosystem to build for themselves. I think those were essential and then of course the ability to to try to execute as quickly as possible Model accelerated, of course. I think those are the things that really make make it work. I think there are their challenges in the when you have that rate of adoption, like the the the big internet protocols had a lot of time to think about enterprise adoption, uh to define authentication, to define identity. They had they had sometimes decades. uh we don't have that luxury and I think that's difficult to catch up and doing it right in the first place. Um the second part is um that clearly AI the the AI industry is The large AI industry, I think, not necessarily the labs, but the the discourse on Twitter and others is very hype cycle based And it's very difficult to stay true to technical technicalities and say we have this current discussion between MCP is the savior of everything to MCP's debt. all within six months. And so um it's an interesting journey in the blue work. And so but but the truth of course like somewhere in the middle, it's a very useful tool. uh in the right moment. And of course another moment it there are better tools. And that's has always been the case. And I think everybody in you've been in the industry forever. There's always nuance and uh I think that's the that's the tricky part at the moment to navigate that

Speaker 1

How should I as a so I find myself as a developer having a mix of MCP tools, command line tools, and whatnot. How should I think about that? Like where is the right place for the right type of

Speaker 3

Yeah, I think the the in the end of the day it boils down to like if you're a developer with a local coding agent, you know a CLI tool, like a command line tool is a really good thing for you to use because Agents are really good at using these type of things. You're a developer, you can you know interact with it, you understand how it's been used, you can if there's something going the wrong way, you can you can you know you can go for a login prompt or whatever it might be, right? Like if your GitHub um The CLI is not login, it will ask you for it and you will just do the thing. So in that regard, I think it's it's very it's probably the actually the best choice for the average cloud code users using CLIs. But if you think about remote integrations for customers, particularly big enterprises where you want to give knowledge worker access. um to your systems of record, you need centralized authentication, you need proxies in between, gateways, some sort of guardrails. A protocol is just way better suited in that regard.

Speaker 1

Right Another area where we're seeing where we're seeing use of MCP is within our partner integrations. So where we work with someone else, the other uh company might have the model or the other way around. That's also seems like a very useful way to Absolutely.

Speaker 3

I think in in general, um in it like protocols like that are a good way to like scale having two different things independently develop separately from each other and and move uh without having to like Classic card, eh?

Speaker 1

Cool. Another factor that we've seen at Spotify being very important is the context that we give Claude when we start solve trying to have Claude solve a problem. So we have spent a fair amount of time getting our Claude MDs right or getting our skills right. We're trying to figure out getting higher-level context like our strategies into Claude's world somehow. How do you what how do you do this at the topic? What's the what's the way, how should we think about this problem?

Speaker 2

This is a really good question. Um I think A, it's it's it's one that's s a bit of an art versus a science in a sense of like there's different toolings and mechanisms in place. And if you take local, for example, um everything from like hierarchical cloud MD cells you can have a whole monorepo um that can be quite structured in a way where people can like traverse through it and then inject and eject a context as need be. With that said, I do think one thing I'm noticing, like the more I play with flow code or sort of any I think so regantic tooling is that the more I try to confine claude in ways of like this is how you're supposed to do XYZ, this is where you find things yourself. Like that gets in many ways just made redundant quite quickly with as these models leak for each other. Um so what I'm defaulting back to is like very good tool ergonomics it can be everything from MCP servers to like meet CLIs and try to really just give Cloud a sort of both um sort of test on compute sections that have run for hours on end, but also validation loops to see that it it can do things in accordance to to what I want so I can be like certain. testing standards that I want uh claw to accomplish or even just overarchingly like if it should do a a big change that might affect different repos spawn a whole fleet of sub agents that in parallel can do things. And in many ways I can tell that as I try to confine it more in adding it could be context that I myself think is correct, the more I can tell that that it's getting a sort of a a bar that the models just keep on leap forwarding across where I sort of get myself out of being the bottleneck in some sense, really sort of remove myself from the equation. And it becomes like these sort of defaults. So like you can you have a a good um you know knowledge based tooling where people can either use keyword or semantic search to put information that's gonna be like fundamentally a lot more powerful than me trying to uh handhold Claude and doing things for me. Um bit of a diplomatic answer, but I would say like trying to give that to erbonomics and taking a step back and let Claude cook in many ways is a is an easy way of trying to get more juice out of code.

Speaker 3

Yeah, that's an interesting one because I think if you look back a year and a half ago, there was a lot of, you know, work put into the context engineering of the Ferness. Um, you know, people have done complicated rag pipelines and these type of things. Um and then I think Claude Code had one board had like a very good realization that he looked at the extrapolated the model capabilities. Um and just realized, okay, if I have a big, you know, if I have a you know normal sized repository and I just give it a grab. It actually, you know, over a few iterations and maybe with the use of subagents, gets actually to a point where it can quite well reason about things. Of course, you know, later there And if we can talk about scaling limits at large corporations, right? There are limits where you want additional like systems for it and you require them But I think when it comes to to context management and context engineering, I think having a good set of actually fairly simplistic like setup that is reproducible across engineers. with a good set of cloud. mdf setups, a good set of skills that are really capture the the essence of the role you're trying to do or the domain you're trying to operate in. I think that's really it and don't overthinking it. I think that's actually Well eventually, I think the the most important part because sub-agents are really, really powerful in compressing and and providing the model. Right. Um the things and then in addition, we were seeing of course that context windows are increasingly getting bigger. But I'm actually quite curious from your perspective within within within Spotify at that scale that you're operating on. Are there challenges around context management? And if so, like how do your internal development platform help you in that regard?

Speaker 1

Yeah, so so uh the way that I think about it, we have three roughly three layers of context that's like my mental model for it. One is what we've been talking about now, which is the stuff that goes into CloudMD and into skills and whatnot And I like I said, we've been iterating like crazy on that and I feel that we're maybe in a good spot. Uh just like you said, my Learning a bit the same in that the models get better and better and the tools get better and better. So our the quality of our context there, the importance of that has reduced a bit over time. Then we as a company have a high-level context, which is like here's the priorities we have as a company, they come with a set of strategies and whatnot. Very high level, like these are the business goals we want to achieve, and those types of things. And then there's this mill layer that we don't have in a good way today where we want to connect those two things. So we want to Be able to have work flow from Licrostrategies into building that out into PRDs or whatever they would be and then have that feed into what we then build with Claude. And we're trying to connect those three layers and we're still we still have some gaps in there that we're trying to cover. So that's that's my mental model of it at least. And like that lower layer, lower lower layer I feel that we're doing fairly well. We have built up, so we have a few few big monorepos where we have done a lot of iterations on our on our context, and then we have a ways to distribute context for people that work in our polyrepos.

Speaker 3

Do you look for more standardized approach across the company or like how do you manage the need for like flexibility in one part versus like a more standardized approach across everyone.

Speaker 1

Yeah, we have definitely made a bet on on fairly high degree of standardization. This was true also prior to AI, just for human developers, we made the same bet. So we started investing five years ago now I think into what we called golden technologies. That was our cold word for standards because that word was a bit scary to us at the time. Um and then Together with that we uh implemented what we call fleet management, which was our way to automate changes across our fleet of software. So we have Tens of thousands of pieces of comp of components that makes up our production software. So managing that through automation, we started building that out. Which is a good segue. One thing that we then ran into with trying to do these. Automated code changes across our feet wasn't it's very very hard to do that in a classical deterministic way. So as models came around, we started pre-immediately experimenting with doing that with LLMs. And many, many, many iterations later, out of that came our internal tool that we call Honk. working super closely together on. So I would love to hear your perspective on that. I mean Christine you've been working with our with the team on home quite closely. What does that look like from an entropic point of view? I'll I can tell the story from our our side as well.

Speaker 2

Totally. I mean I'd love to obviously have the been involved in the shop again for quite a while, so it's it's actually kind of fun to see it all sort of come out into into the public light. The whole blog post series also a great one. Recommend a lot of the customers I chat to to read up on it on and to see a a real sort of technical, tactical implementation in in the wild. I will say from from our end, I think what's been really cool is to see the whole journey. I joined a year ago, a lot of the early conversations already kicked off in in June and continue growing. I think it's been quite remarkable to see how, as you mentioned, there was already this sort of early precursor of like a non-LM centric way of doing things, which I thought was really nice. sort of baseline to understand how you could again in a similar similar vein as these LMs are sort of growing into this exponential, you just see like, okay, how can you actually get more and more done with with Claude or any any sort of LM at the time And what was impressive was to see both the ambition of like what's the actual implementation that's gonna be done and in many ways it was centered on coding assistant to begin with. spawn up this PR through uh a Slack chat and like continue it trading both in that chat or or pick it up somewhere else. And I think across the year you can see more of that showing its utility internally, at least from what I could see, but also how the ambition grew quite quickly. I think it went from just quite confined coding assistant to also the ambitions of being a lot more. and being sort of empowering to any professional at at Spotify and only devs but across across the board. Uh and that's what I was quite impressive to see as even internally at the time. I mean if you look at cloak code itself, it was very much a coding specific tool. Now that became the backbone of cloak co-work, etc. And I think we're we're at this sort of um exponential phase where where products are already coming at a rapid pace. But but I think that jump uh I could see that you guys were doing quite quite early on where it was going beyond just PR here, but maybe something that could help with. data analysis or go through some logs, some SRE thing, it became a lot more than just uh your dev-friendly tool. Um I think maybe to slip it back on to you, if you look so again if you can extrapolate another six months from now with honk, what are the kind of Uh tasks you expect people to to use honk for is it gonna be as general purpose? Is it gonna be more niche? I'm really curious to pick your brain on that.

Speaker 1

Yeah, it's a good question. Let me tell the little bit more of the backstory to get into that. So so as I mentioned, honk started as a way for us to The way we imagined it was to automate code changes across our feet. So migrate from this version of an API to this version of an API or whatever it could be. And again, we have done a lot of that with deterministic scripts, like code mods before that, but there's a very you very quickly run into glass seedling of how complex changes you can do in that way. Honk was the sort of the way that we imagined breaking through that glass ceiling. But very quickly we realized there were many other ways that we could use it, just like you said. So increase we've we st we've started to increase the use cases for it. So like you mentioned, like you can ping Hong Kong Slack. So the very typical user interaction these days is some people discussing some some problem they want to solve on Slack and then just at Mention Home Club go solve this. So that's just a very convenient way to kick off an agent to do some work for you And it like you said, it also enables people that aren't developers or haven't set up all the infrastructure that you need to get stuff to build and whatnot locally, you can just kick off home to do this. So but honk is still uh somewhat limited for us internally at the moment. It is mostly one shot, so you can't go back and forth with honk very much, so you can do a few cycles, but once you once it creates the change you're sort of stuck with. And it largely expects that you will the output will be APR. Right. And just like you said, there are many other use cases where you might want to use something like honk. There might not be a PR at all. You might be investigating an incident or whatever it is. So we're right now working on, I don't know what it is, the sixth generation of homek or something like that, where we're gonna remove a lot of these constraints. So it's gonna be fully multi-turn. You can um make su make multiple PRs or no PR or whatever the outcome is. So that's where we are trying to take it. We also I think announced last week that we're also making honk available externally as part of our backstage offering. So it's going to be part of a way to do the same thing that we started with with automating code changes across our customers' fleet and the in the backstage space. So It's a lot of interesting work going on in the home space within Spotify at the moment.

Speaker 3

What's the I'm curious actually? When you enable when you reduce the friction to make changes like that. There's a downstream effect for developers, for your development systems, right? Yep. What have you seen? Like what are the challenges that then pop up, right? Like you you You're removing a bottle like on one hand and you're you're getting something else, right, in return. I'm curious about like what changed for you.

Speaker 1

Yeah, many things. So so one thing that's been interesting with uh having non-developers poke around with doing development is just that their expectations are very different from developers. As developers we've been trained for years and years and years that Yeah, you have to go through setting up all the crap that you need to do development. In the end you're gonna get this weird thing that is called a PR. Like none of that makes sense if you're not the developer. So now we're now we're like trying to adapt to the non developer use user in this as well. So to take an example We haven't actually rolled this out internally yet, but it's gonna be rolled out soon. But like a way to do prototyping much easier within our mobile apps, for example Where the output is not a PR, it's actually that you get a full app that you can install on your device and run with your prototype code in it. So that's an example of where you can actually take something then you don't have to build that app, you don't have to deal with the PR, you just get the app to run.

Speaker 3

So that's interesting because now you don't have like like a thousand developers who have like five times that because every PM, every designer that's exactly right cannot go and do it.

Speaker 1

So that's exactly what we hope to achieve. Yeah. And also for the developer, like it's Even as a developer, it's not necessarily easy. If you're, let's say, you're a backend developer at Spotify and you want to make a change in our iOS app, it's not very easy to get that setup. So even within the developer cohort just enables people to do things they weren't able to do before.

Speaker 2

And mix. Super, I'm guessing you're gonna see a shift from people writing PRDs to just having demos end to end that they can just say like, hey, this is This is the thing.

Speaker 1

What do you think of it instead of saying like this conceptually is something else implement and rather just jump to the And I think you've been talking about that as well that you're very prototype driven internally, right? Is that right?

Speaker 3

Basically it's I think when when people wonder like how how anthropy does development, a lot of it boils down to prototyping of, you know. Um code wins arguments, I think is what people always say, but now that if code becomes very cheap, it becomes very easy um to make that argument. And so in return, I think you know it's been a has always been an a culture of of prototyping, but the the interesting aspect is um as the company is growing, it has retained that. And I think that's because of the because it's easy and simple to do because of things like cloud code.

Speaker 2

That's cool. Just to ask that as well, I think one thing that's fascinating for mine and being obviously internal is uh a lot of stuff is it's almost like an internal marketplace in the sense of Do we want to find PMF entirely? It's like, oh, are do we have daily active users of anthropic folk? And you know, what's the retainment? What's the stickiness? And I've never really seen that before and like here you can iterate that fast that your own people become that sort of gateway to like, okay, this is something we should actually ship externally because we're seeing everyone using clock code daily or MCP or anything that really blew up and started with people having this insatiable desire internally, which is a very cool sort of if you have this inner so you can actually like tap into um I think that's a good signal to say like this will probably also work externally versus like a big sort of hypothesis testing phase of like this is something we think we should deploy it, wait six months, rather you can sort of see what's what's the internal stickiness of any product that we built.

Speaker 3

Yeah, but but I think the the others are the people then don't see they see they see the the surviving product. There have been probably like a hundred X or 10 X at least of products that have been scraped. They were good, but they just didn't never find that product markets fit internally. I think the best example of of how this all works is um the recent release of co-work um within anthropic. It is something that I think people latently knew there is something there. Clearly Claud Code has been very useful. Clearly, people in marketing are starting to use Claud Code. And it's frankly not the best interface for a non-technical person. And and so, but then, you know, you just needed to build it, right? There was no purity. I'm sure there was no purity about like what we should do. It was just do it and it took off internally of course and then similarly externally the moment you released it.

Speaker 1

Can you tell me more about how you so now you touched a little bit on like the prototyping and and getting to to for market fit, but what how you how have you set up Anthropic as an organization to drive that type of, I don't know, ideation and innovation that creates the 10x number of prototypes. What's the fundamentals there that enables that?

Speaker 3

I think a big part is um culturally an aspect on one inside there's technical aspects to that, there's cultural aspects to it. The cultural aspect is that if you uh build a company around um you know the the you know AI, I think you will have everyone of course being uh a user and uh right away and um really making it center to their um to their f workflow and that of course um you know, then allows you um to really experiment and play with things very quickly. I think that's one aspect. I think the second aspect is really making sure uh that this culture of like how do we get products out is through prototyping, which then tells new people you need to build prototypes, which is then inherently is the cell like the cycle that keeps continuing The second part of that is that on the engineering side, it's just very easy, you know, with something like Cloud Code at your hands to to make changes very, very quickly. And from the very beginning you're you're taught to make changes uh quite quickly. And then of course we're just trying to um uh keep up with some of the the down you know the downstream effects of this right like uh CI and other aspects as you can imagine are are then more challenging and that's just something where you need to invest because you understand that um as you're moving the bottleneck from you know the in the initial engineering time uh down to um builds and c i, um that becomes becomes the ball right. Um and but it's it's overall a very fluent thing, which I think is quite surprising. But we do of course have uh a fairly big standardization across uh the organization simply by the using of cloud code and monorepositories for us for the most part.

Speaker 1

Can you talk more about that standardization? Like how do you do that practically like you agree on things or uh or implement those standards in your code?

Speaker 3

A lot of it is um a combination of good defaults So if you go into the mono repository, you would start Cloud Code, there would be the right CloudMD files, there would be the right skills right from the beginning. And those are curated and carefully considered. Um similarly, if you would start out uh something like cohort, you would be connected to um systems of record internally. For that are useful for the average anthropic person in terms of like knowledge bases and other aspects like that. Um and so I think that's um that's really one of the key aspects of it. But as we're seeing um the company grow, uh which it has quite a bunch in the last two years. We're seeing that there's an increased need for um, you know, an internal developer platform that is more unifying the type of skills available to people, the type of like MCP servers available to people. We have put a lot of effort into that kind of work of like how easy it is to stand up an MCP server internally and these type of things and where the registries are and the kind of things that come with that that you already know you have to learn as an organization that you know the guardrails the safety aspects, these type of things that eventually you will require. And those are the things we're building.

Speaker 1

Was that something that you built from scratch as a company or has that evolved as you've been

Speaker 3

scaling up it's it's evolving right and that's back to like we're we're um it's a it's a young company that that um that from the very beginning has leaned into AI of course you know that then of course you're building for yourself and right you know people are very Because it's so easy for us to do now, right?

Speaker 1

Wanted to connect back to something you said earlier on on in terms of new bottlenecks. So this is something that we're seeing at Spotify as well of Now we're able to produce much more code, so we're seeing more pressure on we've been fighting fighting an overloaded CI system over the last few weeks, and uh we're increasingly seeing pressure on code reviews Because even though us as developers now have a bunch of agents working for us, we still have Humus in the loop to do the code review point. Right. How how do you uh well what's your practices to begin with and how do you deal with that type of new bottleneck if if that's the case for you?

Speaker 2

Well we do have our code review, just saying. Um yeah, I mean honestly this is like a really pretty hard one in general. I think you're you know, David mentions earlier we have this sort of downstream effect of now, you know to to to create these um to create a PRs easier now than ever before and I think we're seeing this volume come through upper tunnel like what do we do with this downstream and in all honesty I think clock over you is actually great. It helps It helps finding uh known availabilities, but yeah, it can do a lot of thorough testing. I don't know if the real conclusion is just like have more sub-agents that do verify verification dupes in your CI. I think that's that's sort of a band-aid and it does do a lot of trick, but you'll still need someone who's accountable and I think that accountability is still still key. I think one thing I want to add to what David mentioned earlier, a lot of the people that drive products at Anthropic you have very clear high agency DRIs. I think that's a a really powerful sort of culture to have because that way it doesn't really matter who generated what or what was behind it. if it's an agent or a human, it's very much like outcome-based and you also want to have someone who's accountable for the outcome. So if something goes down, someone needs to be responsible to try to, you know, bring it back to life. Uh or or sort of um fixed instant. I think to that point, you still want someone to have a sample approval to say this is something I'm accountable for now. And I I approve. And I we have a bit of a culture that if you if you write something, you approve it you're sort of also the person that people can come back to in the future. And I I've done that myself or I spent up a PR ask club to find someone who's relevant for this part before and then get them to basically help through this. But getting back to your your question, I think At its core, it's a bit of an unsolved problem. I think where we have these tools that try to mitigate that, like vocal review or strong CI systems that can bean into um verifier dupes that can run for hours on your behalf and better better testing. But it is one where you still need someone to really be be an owner of it. And and I think a key thing I'm seeing many issues is that you can have a lot of people who have high agency billing stuff, but when you want to pause it together, they can also just conflict. And I think that's just like a organizational problem as well, with humans being in the mix. Um but definitely a lot of emphasis on on strong CI helps, but it doesn't solve problem big thing.

Speaker 3

Yeah, I think in my mind there beyond Kristen said, I think they're There's an additional like concept here that I think people have in their mind that they don't really speak about, which is um an implicit risk assessment of the type of change you will make. If you are working on Anthropics inference stack, you will probably have people in the loop that will review the change because it's critical, it's on the critical path. So the infrastructure and you could probably see the same on Spotify when you serve like the streams and so on. That is critical infrastructure. There you need people to take care. It needs to go through the right set of layers of assurance. in CI, maybe in shadow deployments, these type of things. This is slightly different in in areas where Things can be feature flagged away, like typical front-end product areas are good examples of this. So one thing, for example, that allows cloud code and co-work and some of the product surfaces move. had a at like an unprecedented pace in my mind. It's really this ability to put things out there And then have the company tested where what we call ant fooding or people call dog fooding or whatever it might be. And you get the the latest of the greatest as and sometimes the the breakest. And and you to have everyone test it and see what sticks and what what works and what doesn't work. And then you're slowly rolling it out to users. And even there you have the ability to like flag it to 10% 50%, 100%, because at some point some of these systems and these interactions become so complicated that they're very difficult to fully test anyway. And so it's just a matter of like what your risk assessment is and that there's a different risk assessment between something running like deploying something like Cloud Code and you know Um I don't know, writing a banking transaction system. Right. These are fundamentally different problems. And I think I'm curious how you do with this.

Speaker 1

Yeah, I would say it's it's very similar to what you said. So we have a a notion of uh what we call reliability tier, which is essentially creates systems into how critical they are. And we think about it much in the same way that you you said in terms of we we believe that we will have more human judgment applied to systems that are more critical for our use cases. And then just like you said, we're also heavy users of feature flagging. So that is something that it everything we roll out to our users is behind the feature flag and uh as part of doing A B testing and and validating those features, we sort of get some of that QA as part of that as well.

Speaker 3

Yeah, I do think that like And that's not necessarily, that's just my personal opinion. I think I do think that some of the more interesting testing systems like static analysis, um, exploratory testing, I think they were getting increasingly important in this type of world because the verification aspect is so critical for agents.

Speaker 1

Yeah. And we that's part of what we've been doing on the standardization part that we were talking about before. So one bet that we're doing is that we're putting much, much more static analysis and link checks into our code compared to what we had before because we're finding that's a very efficient feedback loop for for our agents as well. Like they really They understand those guard rates really well. So much better than the sort of loosey guidelines that we've given to humans historically.

Speaker 3

Okay.

Speaker 1

Cool. So let's spend a few more minutes thinking about where this is going. So now we've been talking about where we are today or how we've gotten here So as we've been talking about, we're seeing agents doing more and more work for us. And one thing that we're seeing at Spotify is that that opens up time for our developers to spend more time on new features. We're a company that has uh endless amount of ideas that we want to uh try out. So that's great for us. It creates more capability for us. What does this look like at the tropic? Like how do you think about that that side of things How do you see capabilities or capacity shift around as your developers use more and more of Broad?

Speaker 3

I think, you know, this is this is the interesting aspect. I think the if you look at model capabilities, the amount of like Task like uh the the length of a task a model can do is quite constantly increasing, um at like a rapid pace. And so I think it's um clear that we will be able to do more and more with with um with the same amount of people and ship more products and ship more uh more improvements Um of course it also means that the research will get accelerated. That's really like that's a classical domain where um exploration in the in the undefined space or the unknown space is important for you to feed back what you actually need to build and the more abilities you have, the more you can scale like exploration with an agent, um, the easier it becomes for you to see the paths you need to take And so I think we will just see rapid acceleration in that space. And I think the interesting aspect that I think has been clear for developers now is that acceleration. But I think this year is the first time you really see it with the general knowledge workers in um in a more uh generalized way. And I think a lot of the lessons run standardization, having an internal developer portal, these type of things will suddenly even apply to the more general knowledge worker approach and pure cultures. What do you think that means for for you for our backstage and other things? But um so I think this is gonna accelerate. I think it uh then opens new questions of what the role of a developer will become and how you're gonna manage uh your fleet of agents. And I think these are limits that we are already seeing and that people are feeling in a day-to-day basis Um because it turns out if you have 20 card code panels open, it's just so much you can pay attention to every one of them and um giving you notifications. And so people of course in their minds already like what's the what's the next step, what's the next thing you can do. So I think there are but it the interesting part there is that it will just open a lot of unknown unknowns that I just don't know what's gonna happen yet. Um and but I do know that having these tools available to us will make it much easier for us to experiment with the right solution for for getting around it. So um I don't know. I'm I feel I'm strapped onto a rocket I'm just into the ride at this point.

Speaker 2

So true. So true I mean I'm seeing it a little bit in in the plate eye sense that like one thing that's both something I really enjoy and also a bit of a pain is that As these models keep on leap for each other, you can tell like what's on the front here keeps on also like getting erased. So a good example of this is like if you look at skills or at least. towards the end of last year. This was like an amazing way of trying to give just in time instructions for Crawl to do a specific thing. Say like build a docx file or a PowerPoint presentation. Yeah, you can inject those kind of instructions. What we're seeing now as well is that for each more generation that we're releasing, it's just getting bundled into the model. So there's elements of like, you know, the bitter lesson from which you're sudden and like you build these things and you think like, oh, I spent so much time in this beautiful orchestration and then oops. Two months online is actually just an API call away and cloak can do everything and so. In Clocode, agent teams, you know, clocode can on demand create these sub-agents needed. So the whole orchestration I you did yourself back in the day in your workflow is in many ways not really needed anymore. Um and I think that's that's one where I think You know, you have to sort of be willing to kill your doggings and what you do, especially when you build on on these uh on this technology that's always changing Um and always try to take a step back and look holistically at like, what is the problem you're trying to solve? Is there actually value in solving that problem? I think many ways you can get distracted by a shiny new thing. And so do we with some of the tooling we build and the features we built. But then if you look at pragmatically, like does this solve the problem at hand? Now if you try to extrapolate where where things are going, I think the big um up here is the capabilities of what the models can do. I think that's gonna be super interesting with sort of seeing agenda coding really transform 20 files I think browser use is also there ish. Most people aren't fast enough yet, but you definitely can see the see the future. Uh and the big question is like what's next? And I think that is Yeah, by being on the frontier with the Party AI, you do see those edges of like this is something that can really be the next call code or the next cowork or the next Next big product. Um it's exciting times, but also frustrating when you have to keep on you know on the shift the stuff you don't.

Speaker 3

I do think that the last year has focused a lot on the creation of code. Um and we see, you know a lot of developers are beaten anthropic, they're not even in front of the computers anymore. They they do code from from the from the from mobile phone. Um and so with that creation, I think there's an aspect that so far has been slightly underserved, which I think we will see more of this year, is like the software. development lifecycle. That software that's been written needs to be maintained. That software that's been written needs to be deleted at some point. Um and I think that's things that we're in that we will see in this year that this will be that the models will take a more portion of the full life cycle, the maintenance parts of the the leading parts. and that type of things. And I think otherwise that amount of all the like code written by models is just not sustainable. But I'm quite curious like from your perspective what has changed for for your for for your organization but also like for like a thing like an internal developer platform like what is the shift you need to do to get ready for this

Speaker 1

Yeah, so I mean we've talked a little bit about how our development flow has has changed and it's changed fundamentally over the last few months. Just like you said, developers now sit in front of a bunch of pods doing most of their work, and we've been doing all of this uh work that I mentioned before. improving our context and now we're trying to connect that into our full product development lifecycle. So there's lots and lots of exciting stuff going on there. Uh in terms of backstage, I think one interesting thing with backstage is that it's now becoming this um obviously becoming this thing that agents talk to instead of humans spending necessarily time within backstage. So we talked about MCP before. I have a bunch of MCP tools that are connected to Backstage that does a bunch of the things that I use to do bits work. So if I need to debug a deployment that's gone wrong. That something I used to do backstage. I would go go in, find the backend service in there, look at the deployments for it and whatnot. And now all of that happens through Cloud. So yes, it's been a very big change in how we work and also how we work with our tools and infrastructure, I would say. Yeah, so the the sort of main tool becomes your agent rather than um the other tools that we had before.

Speaker 3

But what do you think is a is there a difference in like humans interacting with the system versus agents? Is there things you need to be more worried of, things that are more important suddenly that have not been important as well?

Speaker 1

There's certain certain notions of having agents poke around in your production environment that still makes me Uh not sleep super super well at night. So there's things we have lots of things to solve there. So currently we're fairly restrictive in about what we do, what we do in that space. But of course we want the same leverage with Those types of tasks that we get with coding today. So we want to figure out that balance and allow agents to play many more roles and do more tasks. Yeah. I want to pull on something you said before. You mentioned other knowledge workers, non-developers. Yeah. And how With developers we sort of have it easy. We have our monorepos where we can have everyone improve the cloud MDs and we have all of these neat tools that we use. With knowledge workers, we don't really have any of that. So what patterns are you seeing appearing now of like how do people share their context in co-work or like how are people doing this in practice?

Speaker 3

So the the the patterns I think we're observing in in at least within anthropics I think that um a lot of the the domain knowledge, the abilities to like use the tool effectively in a more standardized ways goes by virtue of like people writing skills and plugins connecting for certain MCP servers. Probably the by far most popular MCP server at Anthropogus, the Slack MCP server. Right. Because everyone just once uh to get an overview of like what this fire hose of information um that is slack uh um actually what they need to care about every day. Um and so I think what we're what we're seeing is that knowledge workers now start using these tools and connecting with like Google uh drive or uh Slack more and do more of their tasks in order to share how they best do this. I think they're using, you know, we're we're for our internal marketplace And we're using that quite a bit. And then if there's a solution that's like very obviously the winning piece, it will just be enabled by default. Like our you know, our knowledge base is just enabled by default. Then And then beyond that, at this interesting intersection between some of the things knowledge workers do, sitting next to developers, some of the developers then prototype specific tools for these knowledge workers. that help them do do daily tasks um and so on. And so what we're seeing really is a more organic thing and I think out of that organic part um we will slowly observe the patterns that will work in that respect. But I think the interesting s m piece that I'm seeing now is that between model capabilities, connectivity bit like MCP and and scales. We are at the point where these models do meaningful things now. for knowledge workers in in their day to day. And I think that's like the interesting step change that we have seen since probably four or five actually.

Speaker 1

Right.

Speaker 3

Yeah, cool.

Speaker 1

I think we'll end on that. We're way over time. Thanks a ton for coming. As Sarah said, we'll be around for the next hour as well.

Speaker 4

So if you have any questions for us, you're more than welcome to uh to track us down and have a chat Thank you. Thanks so much.