#115: This product leader built an AI brain that runs on every computer at his company | Kyler Ross (Head of Product @ Cloaked)

Overview

Kyler explains how he moved his team from chat-based AI toward a shared operating layer for agents at Cloaked. The shift started from a simple pain point: too much time spent copying outputs between tools, re-finding prompts, and re-explaining company context every time a new session started.

What followed was a company-wide system built around structured markdown files, reusable skills, tool access, and automated guardrails. At a company of a bit over 100 people, he says the setup is installed on every managed machine and used by most employees, with Slack as the main access point for non-technical teams.

Key Takeaways

The strongest idea in the conversation is that AI works better when you optimize for the agent, not the human typing into a chat box. Kyler argues that most teams still think in terms of "how do I prompt this thing," when the better question is "how do I make context easy for the agent to find and act on without wasting tokens or getting lost."

His answer is a shared directory of company knowledge, mostly in markdown, plus scripts and workflows that let the agent read from core systems and write back into them. He describes this as repeatedly onboarding the agent to the company, because every fresh session starts cold.

Another point: access matters more than elegance. Cloaked found that asking everyone to use terminal-based coding agents was a losing battle. For broad adoption, Slack agents worked better. Technical users can go deeper in tools like Cloud Code, but support, marketing, and others get value by tagging an agent in Slack and letting it run against connected systems in the background.

Kyler also treats this internal AI layer as a testing ground for a more automated way of shipping software. He says code generation is getting faster, while review, QA, and validation are becoming the bottleneck. His team still keeps human review in place for production work, but he is building toward a future where simpler changes can move through automated review and testing without a person checking every step.

The other big lesson is that agents need deterministic guardrails. He does not trust prompts alone. Instead, he uses hooks to force good behavior, like creating git worktrees automatically so parallel agents do not stomp on the same files, and cleaning them up once work is merged. That keeps the system usable at scale.

Practical Steps

Build a shared company context layer in a clean folder structure. Use markdown files for product, team, process, and company knowledge so agents can retrieve context quickly.
Turn repeatable workflows into reusable skills. If a session works once, save it as a skill instead of rebuilding it from scratch next time.
Connect AI to the tools where work already happens. Read access is useful, but write access is where the payoff shows up: creating tickets, docs, messages, and code changes.
Use Slack for broad adoption. If your goal is company-wide usage, meet non-technical teams in the interface they already use.
Add hard guardrails with hooks. Force actions like worktree creation, PR checks, and cleanup automatically rather than hoping the agent remembers.
Create review agents for internal systems first. Use lower-risk internal tools to test automated review, QA, and CI patterns before applying them to production code.
Add a cleanup agent. Kyler’s "librarian" checks for stale knowledge, drift, and accidental sensitive information, then alerts the right person.

Notable Quotes

"You need to make it easy for the agent to be able to find the thing they're looking for and pull it in and access it easily without spending a bunch of context." - Kyler
"You're basically onboarding the agent onto your company." - Kyler
"I try to think about how can I set the agent up so that it's impossible for it to fail." - Kyler

You need to make it easy for the agent to be able to find the thing they’re looking for and pull it in and access it easily without spending a bunch of context. — From the episode

Full Transcript

Source: openai 1h 04m runtime

Kyler, welcome, so excited. Thank you for having us. Yeah, it's good to be here. Yeah, I've been looking forward to this since probably like eight months ago when you pinged me like, Hey, like I have this really cool, like AI workflow that I'm building on my company. I want to share it. I think you like send me like a long message explaining what it did. And I was like, cool, like you need to, you need to share this with the community. You did a session about six months ago and it was a total hit. Like a lot of people messaged me after being like, Hey, like that was awesome. Like I feel like I learned so much and inspire me to get my team out of like the chat interface into more coding agents, like cursor, cloud code, codex. I feel like you were kind of early on that trend. So maybe like, let's start there. Like I feel like I'm still like now the stuff that you're doing is pretty cutting edge. Like I feel like you're, you're good at like saying a lot ahead and seeing opportunities maybe before others do. So maybe what was the moment that you realized, Hey, like actually maybe chat based AI is not really cutting it for me or my team and that I really needed to build maybe like an operating layer or like a harness around it for my team. Yeah. I remember it very distinctly actually, because it was Thanksgiving last year. And in many ways, I think I was a little late to the trend to be, to be perfectly frank with you. I, at the time I felt like I was late to the trend, you know, but I, I was really struggling because I was spending what felt like an inordinate amount of time day to day, copying and pasting information between one window to another window. And I had the same process, you know, we had like a shared Google drive folder with prompts in it. And, you know, we had tried experimenting with cloud projects and chat, GPT projects and things, but, you know, even just taking the output and copying that over into, you know, our BI tool to run a SQL query or taking the output and using that for creating a ticket and realizing, Oh man, I need to go back and find the prompt that I used two weeks ago and trying to find past sessions. It just was consuming a lot of my time. And you know, beyond that, I felt like as a person who obsesses about making sure that the experience of the PMs on my team is, you know, as good as possible, I was very cognizant of the fact that they were wasting a lot of their time. And it was this friction for them being able to do work as well as they wanted to. And so I started thinking about like, you know, at the lowest common denominator here is like the things that you just use all the time, like common, common prompts, common bits of context. And that, you know, I started assembling that. And I also started doing research into kind of like what the standard sort of conventional approaches for building out a folder, a directory structure that contains company context and like what that is supposed to look like and how to make it as agent friendly as possible. Right from the get go. I think that thinking about agent experience when building this kind of stuff is really important. And, you know, I think a lot of times people are stuck in the thinking of like, well, I need to make it easy for myself to type. You need to make it easy for the agent to be able to find the thing they're looking for and pull it in and access it easily without spending a bunch of context. And so, you know, what I landed on feels I think kind of commonplace now, but, you know, it's really that you need a well-structured directory that has a lot of markdown files in it. And then, you know, I also started layering in things like scripts to do common, like, you know, certain tasks, give the agent basically the hands that it needs to be able to both pull in information on one side from the tools that we use every day, getting, you know, all the off set up. And then also the hands to then be able to go and do the actions that are necessary and that, you know, you can get it to read the context on one side and then actually take the action and write on the other side, then you have a pipeline that enables you to really start to create more leverage for yourself in what you're doing day to day. And I think that over the course, you know, from Thanksgiving through to February, when I gave the presentation to the super community, I think it was all about trying to fill in those gaps in context and build a really strong harness that basically represents like, you know, a directory structure that represents what cloaked is, what our product does, what our team looks like, you know, all the things that we kind of feel are implicit that, you know, you just, you onboard someone and they're like, oh yeah, they need to get to know everyone, whatever. Like all of that is that you're basically onboarding the agent onto your company. And the weird thing is that happens like every time you start a new chat session with a coding agent because it basically needs to re-onboard, it's coming in, you know, completely fresh. And so building out that process of getting the agent to speed on everything was really my focus during that time period. And since then, it's really been all about trying to scale that up, make it accessible in more places. Really, it should be, it should be everywhere, right? It's like, it's an effective system that, you know, many different people on the team are contributing to. And so we should be able to leverage that in as many places as possible. And then also making it as foolproof as possible. Like I try to think about how can I set the agent up so that it's impossible for it to fail. And I'm excited to kind of go into all of that. I think there's a lot there, but that's really been the journey that I've been on. And it's been, it's been really interesting. It's been a great learning experience. So just, just some kind of like kind of continuing, just like to set up the context for the conversation, a couple like quick questions. Can you just like give the, give us in the audience a feel for like how big Cloaked is right now? Like how many people kind of like may currently or in the soon, in the near future, engage with the system that you're talking about? Yeah. So it's, Cloaked has grown and our usage of the system has grown rapidly. So we are very recently post Series B company. We're really entering into that like hyperscale mode where we're growing really quickly and we're scaling the team up really quickly. At this point we're a bit over a hundred people. And you know, I think during the sense that, you know, time that we talked, it's definitely, it's grown up to get to that point. And at this point it, the, the, the harness itself is actually installed on every single, you know, company managed computer. It is distributed by default to everyone. And that, you know, in terms of actual active users, I think it leans more on... I'm guessing everyone probably, that's right. Like 80 to 90% of the people probably engage with it. Yeah. Who owns the harness? Is there a person in the company that's like, I know you kind of like started that movement, but like, does that mean now you own that for the entire company? Most of the, most of the code and contributions are me. I would say probably like 80% just because I am kind of obsessive about this. Like I think actually there's a bit of a, I want to take a step back because I think there's some important context around all of this, which is I need to separate my hobby from my professional life because I think that, you know, sometimes I'm talking to people like, you're spending so much of your time like coding, like head of product. You do like, what do you have time for this? I'm like, I don't. The reality is like, I just kind of gotten obsessed with this. It's totally sucked me in. I've gone way down the rabbit hole. And I think there's a cool thing about working with coding agents, which is it has a dark side too, but it is far more like a far more dopamine rich work environment than I think we're used to otherwise. And so it's very genuinely fun. It's really, really enjoyable once you're, you know, you're working with them and you're, you're iterating, you're building systems, you're automating things. If you have that itch in your brain to automate something, when you see it, uh, then, you know, once you get kind of the foundation set up, it becomes very easy to do that active audit. Oh, there's something here. I'm going to fix, I'm going to automate it. And then now I won't have to do that manually in the future. And so I am after hours at night on weekends, I find myself getting stuck. I can't help myself. I'm like, I want, I want to try this. I want to do that. What if I added a thing that does this other thing? And so I think that, you know, my, in my day job, uh, I, I'm not, you know, I, I guess I really own it in, in like on the nights and weekends. Um, but there's lots of other people who contribute. I think that's been the thing that has made this a recursive loop, uh, of like self-improving system is that everyone who uses the harness is able to easily invoke skills that will sort of trigger a like self-reflective loop of, Oh, what, what went wrong in this session? Let me try, as an agent, let me try and fix that. And then let me submit a PR to then go and fix that problem. Uh, and that has been really powerful in getting the system into the point of stability that it's at now, uh, where we are, are you approving, by the way, are you approving that PR or like since you own the system, like, are you the one reviewing that PR or do you have now an engineering team that's like responsible for like approving the changes to the system? Uh, so I really believe in eating our own dog food. And I also believe that we are heading. The process of writing code has become much, much faster. And so therefore the process, the bottleneck has moved to the right towards the, the testing and code review and QA side of things. I think that as an industry, many different companies are at varying stages of adoption and comfort with fully automated code reviews and fully just shipping something to production without having human eyes on it. I think that especially in the more AI native side of things, right? Like Anthropic talks about, Oh, you know, we just, everything's fully automated pipeline. Um, I think a lot of companies are not there yet. Cloaked is one of them. We, that every single thing that we put out, uh, gets tested and, you know, get by human eyes on it. We take a lot of care and craft into the work that we do. Um, but I do personally believe that we are moving towards a world where we spend a lot more of our time building the infrastructure and pipeline to be able to enable us to ship code really quickly and autonomously without needing that human review step for the simpler changes. And then I think for the more complex changes, well, you, you just need to test them and then you can approve it. So what we built is, and as I talk, I'm going to try to avoid you. By the way, I think Mark's question was specifically about the harness, not about how you guys ship in general. Like what was, okay, yeah, just making sure that as we are, uh, I guess my point was more that as we are developing the harness, I am trying to lay the groundwork for what I imagine the future of development could look like, where we have a fully automated code review process. Uh, we have a fully automated test suite and we have a suite of reviewer agents that go through and review every PR and then they can approve or reject or leave comments. And if they approve it, it goes out automatically without any human eyes on it. So I've really been able to automate all of that process for improving the code of the harness. And in the process, I think we're kind of starting to lay the groundwork for what, uh, you know, a future state for some of our CI pipelines could look like, uh, where it's just this easy path to get things out the door, uh, if they are sufficiently simple. Um, and so, no, I don't need to review every PR, which is really great. What I'm hearing and what I love about that is that, cause I think a lot of teams have needed to like kind of figure out what that initial project or the initial team, um, or the initial thing is going to be at their company that kind of allows them to almost like figure out how to work in a more AI native way. And what I'm hearing, let me know if I'm mishearing you, but what I'm hearing is like, this was it wasn't the intention originally now, given that the harness is installed in every single company device and every employee is, has access to it, you've kind of seen an opportunity here to use the harness as a, um, a testing ground of sorts for how to, um, empower, um, a more, uh, AI first AI native way of working in the team in a way that maybe doesn't require human review of every single change, um, that gets deployed to the code base, thus empowering more of the team to like improve the product in a, in a self, um, in a self reinforcing way while setting the right guardrails for human review, where it matters. Am I hearing that correct? You hit the nail on the head. It's all part of, uh, I think trying to get out ahead of where I see things going and try it out myself. I've really been, you know, I think people talk a lot about this and, you know, different very bold statements about, Oh, like, Oh, you know, it's so easy to, to just ship to production and whatever. And it's like, the reality is like, I'm like, I'm going to, I'm going to try it and see if it works. And, you know, in the process, find out it's usually a lot more complicated and a lot more sort of riddled with, with, uh, traps and, and easy mistakes, uh, than it sounds like in a LinkedIn post. And so this was again, really just my way of being able to personally experiment with where I imagine things are going to be in six months. Uh, and also, you know, I can tap into the wisdom of the very talented team that we have at Cloak to, uh, be able to advise them, Oh, this, you know, this is how to, how to add something into the CI flow and, uh, all that stuff. Yeah. It's so cool. Mark, can I ask just one follow-up and then if you have anything you want to add, I'll let you take it. Um, but that makes total sense and I think it's really, really smart. The second follow-up I want to ask is, um, both for my own edification, but also maybe for the audience. If we're thinking about doing some screen-sharing later, but just like if we were to stay in the realm of just like conversation right now, if I was a new person on the team, um, just imagine I'm not technical at all. Maybe I'm like, uh, I don't know, like a, like a marketing customer support or someone or yeah, like customer support. And uh, I get my company handed, you know, MacBook pro or whatever. And they're like, there's a checklist and like one of the things I get access to along with my Gmail accounts and stuff is this like harness. So my question for you is, do you actually call it a harness? Like is that, is that like the internal name? Like if you say harness, everyone knows what you're referring to, or is there like a, have you guys chosen to give it some kind of, if you don't want to tell us the name, but I was curious if you've branded it in some way, and secondly, how would the HR person or how would the onboarding docs like describe to that non-technical person what this harness is? Like how are you kind of like talking about it internally? I'll, I'll be completely honest. I, I think there's very diminishing returns in trying to get everyone to use cloud code. It's just, it's not a battle worth fighting. Um, in the process of rolling this out more broadly, and this ties into what I was saying before about, we want this to be available every possible place that a person could be doing work. Um, Slack agents are hands down the best way to democratize access to agents and agentic systems. And so for a customer support person or a marketer who is not on the technical side, who, you know, seeing a terminal is going to be very, uh, very alarming experience for them. I think that being able to just tag it in Slack and ask questions and, and have it run remotely, uh, which is one of the, I think that an upgrade that we made along the way, uh, you know, is a really big, you know, boon for, for their workflows. And in terms of how I brand it, it's a little silly and it comes from my own, uh, background, but to me the, it really is, it's doing PM work, uh, like, and I think more project management than product management, but I think it is a thing that facilitates the work the rest of the team does. Uh, and so I just call it PMAI and it has kind of taken on that identity. And we also have other incarnations of it that serve specific purposes, more like engineering side of things, because I think that to be able to safely deploy, right, we want to be able, if it's actually writing code, uh, there's extra layers of guardrails and specialization and prompting. So there's a couple of different like personas for it in Slack, but, uh, yeah, generally speaking, most people interact with it. They just interact with PMAI. Yeah. So, so just like, kind of like to, to repeat it back, it's not like you open your MacBook pro and you have like an app that's called PMAI that you just interact with, which is like kind of like maybe more similar to what people have at Square or maybe like ramp where we have like this app, it sounds like maybe like the first instruction, maybe it might be like instructions was like, Hey, open Slack and kind of like in your chat and open a chat. And then in that chat, like, you know, look for PMAI and ask a question about maybe like a customer that you have to reply to their request basically. And so it sounds like that's maybe like the first kind of interaction that someone in like maybe like a non-PM managing team may have, is that, is that more or less? And then, then if that's the case, the next question that I have is like, how do you handle like, because I think a point you made earlier is like super important to get all the systems connected. So it has context, right? Like maybe your email, maybe like if you use intercom or like whatever tools you're using, like that, that context and be able to read like in real time what's going on. So like, is that happening in the background also? Like when someone is on board or like, or does someone has to walk to through Slack or something? So that's where I think there is, you mentioned like the experience of opening up the MacBook, right. And having access to something that they can use that, that there, there is also a app that we have built that it's basically one of the many web, like web language wrapper apps that, you know, pretty lightweight, easy to iterate on, easy to build, that is hooked in with our MDM and our SAML and SSO for authentication. And that gets distributed to every Mac and that with that, what ships with that is the harness itself, meaning the actual folder that contains all the prompts and all the skills and workflows and capabilities and, you know, tons and tons of different, different pieces of that. So for a person who's a little bit more technical that we call the app Start Cloaked, pretty self-explanatory name, it is basically a way to get set up within the company. That is the way that we on board. By the way, do you want to get into screen sharing now? If you feel comfortable, just like, because I'm like, I'm just like craving to see, OK, like, what is this? What does this look like? All right, let me. So what I'm going to do is I didn't, I don't want to actually show the the real one because there's a whole bunch of confidential information in there. But what I did, because it's a thin wrapper, you know, around basically what is just a web app, I can put it in my browser and I have one with some synthetic data that will be easy to use. Do you use PMAI to build it? Of course, yes. So meta. All right, let's see, I'm going to share. Own yourself without any real data so I can screen share you. OK, can you see you're digging it too far, bro? Yeah. OK, cool. Yeah. So from within here, we have the ability for a user to go and they can they can actually trigger a new chat session. And that's easy to do from from this home screen. And they also have access to all of these different different resources, like, you know, setting up different pieces of the process of using or getting onboarded to cloaked, including, you know, getting the necessary dependencies and getting keys and things like that. And, you know, we've also built a lot of different things into this. So it's kind of become a hub for, you know, different things that we are different sort of like entities and objects like teams are here and team members are here. And, you know, HTML artifacts are here. And that has like there's like a to do list thing that someone has built. And so all this stuff kind of gets shipped in this this internal app that we built that is can also start to accumulate some of the different creative ideas that people have for how to how they experience work. Right. I think that it it varies a lot depending on the individual and the domain that they're in. And so being able to have everything in one place and make it really easy for people to contribute, because the thing I found about shipping internal tools is that the cost is relatively relative to actually shipping something to production. Right. Like we have an automated review process. We have the ability to distribute it to our users, our users being employees. And then we have a really rapid feedback loop to be able to understand, are they using it? Are they getting value from it? And so, you know, people will have ideas for things and ship something, you know, on an evening or on a weekend or something. And. you know, on an evening or on a weekend or something and be able to see, you know, oh, like people internally are really vibing with this or they're not and, you know, iterate quickly on that. But, you know, at its core, I think that just having a place to go that has the guides for how to get set up on everything is really important for being able to kind of centralize the experience of using these tools. And, you know, I think that ties back into that larger theme of wanting to be able to kind of pull everyone's experiences together because it gets very fragmented when you have a bunch of people who are all running their own harness and they're all using their own different custom-built tools. It's like there's an inertia to it. Like it's impossible to stop people who are excited about building stuff from building stuff for themselves. And so giving them a venue in which they can build those interesting things, I think has been something that we found to be really, really helpful. And, you know, then it all cross-pollinates and I think it becomes kind of self-reinforcing and there's a bit of like a network effect there where, you know, one person builds something and another person adds onto it, another person. So it really just, it spreads. So just to make sure I'm clear on what we're looking at and also for those who may be just listening to audio. So we're looking at a, is this a Mac app? This, yeah, it's a Mac app. But I'm running in my browser just to make it easy to kind of show off the version. So someone joining the company gets access to a Mac app called Start Cloaked. And within that Mac app, they log in with their company email. So there's some kind of like authentication or whatever, let's just assume that's true. And then there's like a bunch of content that maybe they could look at the dashboard. There's like some artifacts. They already, on day one or day zero, I guess, they have access to a bunch of stuff, but then there's like the Setup tab that I'm looking at. And in the Setup tab, this is where I see PMAI Toolkit as one of the things that they can set up. So the thing that we've been talking about to this point, where you said you call it like PMAI, this is what you call this like harness that they can talk to in Slack and stuff, is that one of the things they need to set up? It's like one of the first things they set up in this tool? Yeah, and it's primarily the thing that we found is making sure that they get off for the various tools that they need to use. And there's a couple of dependencies, nothing too crazy, but just getting the kind of ecosystem set up because it's like what we talked about earlier. The leverage comes from being able to make sure that the system is able to read the same things that you're reading, you know, like it can go into Jira or Linear, right? It can go into Notion or Confluence or whatever tool you use. It can actually read those documents when you need to read them. And then it has the ability on the other side of things, after the decision-making process and the work that you're doing, to be able to actually write to those same systems. So whether it's sending a Slack message or it's creating a document or creating a ticket or actually pushing code to GitHub, that thing on both ends really relies on them setting up their own auth. And I think that's something that we made sure is really tight and buttoned down, but also does require a bit of work on their side to make sure that their environment is properly configured. That's also just the most frictionful part of all of this. And what stops someone who is kind of coming in and just wants to kind of like get up and go, that's the thing that they need to do to be able to be successful. I think that from there, then they can really run. Okay, so there's this, so I'm trying to kind of like track in my head that the employee journey from like setting up the laptop to actually starting to engage with this thing to give myself a mental model for the interaction pattern. So they get access to this tool. They have to set up some authentication or some logins or set up their accounts with the various tools that they'll need to do their job, like we've always done historically. Maybe use something like WorkOS or something that centralizes just like all the SSO things. And once they've set up those accounts and they have access to all these tools and et cetera, the user has been created for their new employee account. Then they go in and they, what's like the process of, I guess, like setting up the PMAI harness specifically or starting to use the PMAI harness in Slack or the agent in Slack. Is there any additional steps or once they just do create all those accounts, they just go into Slack and like Mark said, they just go and start chatting with PMAI and that's like their first, it's like going to Google when you have a question. Like the first place they go is just go chat with the PMAI bot in Slack and ask it your questions. That brings up, I think, a really important point and something that we've come to. There was, it's definitely been points in time where we've been sitting in a room trying to figure like, okay, how does all this link together? And do we need just one, like literally one system or do we have multiple systems? Do we have multiple, like we say like, oh, everyone has to use cloud code or something. But what we found is that different people, different roles are gonna need different tools for their day to day. And the reality is like some people don't need to use cloud code. Some people might just wanna use the cloud web app. Some people wanna be able to just talk to it in Slack, whatever. The distinction between having a local coding model that you are using and using to kind of get your work done, local coding agent versus having something that runs in the cloud that you are able to use to talk to in Slack and do things, I think is a really important one because there's definitely a set of things you can do when you have an agent running locally on your computer that you can't do when you have it. I mean, you probably could, I'm not gonna say can't. You could with enough configuration, but for us, what we found is kind of easy to set up and doable and worth the time investment. We found that it's easier to kind of create those, like there's certain workflows and things that people do in Slack that require the agent to be running on a remote instance versus actually having it on your local computer and running like within your file system. So two different entities there that are running off the same core harness and same core code, but they're different instances of it. So it sounds like the Slack thing is already kind of working because it's already in the background. And so it's just like a matter of like just getting connected to it. And so kind of like to Ben's question, like it probably is already in Slack. You just need to like kind of talk to it, right? It's not like you have to set up some files in your local machine or to get like PMAI working. And I like that basically you have like an opinion that it like stands for what each department should install. Like I see that you have like engineering, product, design, security, like legal, et cetera. So I'm sure like it just like changes the subsets of things that you need to do, which I think is like a smart move. But I think now what I'm getting curious about is like, how do you use it on a day-to-day basis? Like, and how are you interacting with it? Like, because kind of knowing you, I'm guessing you're not using Slack for most things because that probably feels limiting. I benefit from the agents that, you know, ingest bug reports and file tickets automatically or the ones that send me updates about things or, you know, pass through voice of the customer stuff. But in general, I'm not talking to them day-to-day. I do a lot of my work and it's a little bit nerdy, but I found that the thing that works really well for me is using iTerm2, particularly organized around tabs and panes. So let me share my screen here. And again, I prepared all dummy data. So this is more representative of how I might set up my day-to-day. And it's actually behind this window is another window with my real work. And so it roughly mirrors that. Let me get that screen share going. And by the way, so use iTerm as like your terminal provider, if you will. And then- Cloud code running inside there. Cloud code. And then cloud code basically is interacting with like the context system and the like instructions and the agent files that, and those things are basically controlled and set up by the harness that you basically build, right? So it's like cloud code is doing, like it's basically like the intelligence layer. And then the harness per se is like the collection of context files, plus like skills, plus instructions that it sounds like they're all kind of like distributed in a centralized way to anyone that wants to use whatever part of the system, basically. Exactly. Yes. And I'll also caveat this by saying, and maybe Margit, I hope you don't mind. I would love to use your platform as a way to make a plea to all the engineers out there who maybe have their own systems and have a better solution or, you know, someone who's working maybe at Entropic or at work or, you know, one of these, maybe a more polished IDEs to maybe make something that mirrors this a little bit more because I have very specific reasons why I haven't set up the way I do and I have yet to find a good solution. So- Cool, yeah. Screen share that puppy. If you're enjoying this conversation, please check out the links in the show notes to support the podcast. Margit and I do this out of love, but to keep it going, we also need your support. Thanks, and now back to the episode. All right, sweet. So, this is called iTerm2. It's called iTerm2, yes. Very, very, yeah, very, very like developer tool. And I think I will also say some people have really crazy setups. There's a lot you can do with Tmux to be able to customize it and, you know, get exactly what you need. Our CTO has a whole like window pane system that is light years beyond anything that I do. But I think that, you know, just for a day-to-day, what I'm really optimizing for is maintaining context across multiple concurrent workstreams because when I have a new task, my go-to thing is a new task, a new project, a new scope of work. I press command D on my keyboard and then I press, I have a shortcut that I type in and that shortcut launches Cloud Code into the correct place inside my computer to be able to run the agent. And- You just said PM, right? That was all you typed? I just did PM, that's it, yeah. So, having a really quick and easy way to do that. And that's the kind of tutorial that when someone's going through to start Cloaked, at the end, it'd be like, all right, you wanna get started? You know, we've auto-installed these keyboard shortcuts. You just type this and you're good to go. Makes it really easy for someone to get set up because I can tell you, it's very easy for, there's a lot of chance for user error in these types of things. And the same way we were talking about trying to create a safety net, so it's hard for agents to fail, I think same philosophy, create a safety net. This would be the equivalent of like, if I, let's say I have like five side projects that I'm working on and each of them have their own folder on my desktop. And so when I'm going into like Cloud Code or something, I need to do like CD desktop and then I need to, you know, go to like CD folder name and then only then I can basically start to like do stuff inside that folder. So you've created a shortcut that gets you right into a Cloud Code session in the intended directory with one shortcut. Yeah, got it. Is that the main reason why you use iTerm versus like maybe like Warp or like the Cloud Desktop? Here's another reason, which you'll see. I, and I know some people find this totally overwhelming. So this is very dependent on your brain and the way that you work day to day. But for me, I'm very much an out of sight, out of mind person. If I can't see it, it's hard. It's harder to kind of retrieve that information in that context. So I really like how in iTerm, and again, now I'm speaking as a user, I'm hoping there's some product manager out there who's hearing this and is going, oh, maybe I can build that for him. But I really like the, well, just one, the ability to focus when I, you know, I can move the mouse around. But I find that a lot of things like Warp or Cursor they do a thing where they basically assume like the chat that you are working on is the only thing you want to see. And the reality is like, I want to be, I organize it into tabs. The tabs are roughly related to the like theme of the work that I'm doing. So I might have management. I'm prepping for a one-on-one. I might have working on the Start Cloaked app. I have some engineering sessions. I might have exec. I'm working on roadmapping and larger planning things, or I'm, you know, thinking through something related to our org structure. And then I have PM. That's like literally just PM work. I need to make a ticket or something. So I try to route tasks to that thing or route topics of work. And then I make new panes like this, like all the time, all day long. And I close them and keep them going because I want to be able to actually see all the things that are like going. My goal in this is to maximize the number of agents that I can have running and doing like productive work at any given time. And so the easiest way that I can see what the agent is doing and check in on it is to just literally see the feed of things happening. It's scrolling up the screen as new stuff is coming in and new, you know, like the inference is running and new stuff is being generated. And so I can quickly check and see, oh, this one's doing this, this one's doing this, this one's doing this. And I call them all. Sorry, I have a quick question. This is like kind of, I'm surprised I haven't figured out a way to do this. But Mark, do you have your own version of something like this today that you use if you're juggling multiple concurrent running chat sessions? Like how do you visualize them together? Because I'll be honest, I don't think I've solved that problem for myself. I mean, I just use, so actually Codex does a much better job for this. So for Codex, basically like I do work in different folders and the different folders, basically repositories determine kind of like the line of work. So I separate it by repositories. And then I just have like all the sessions in that repository inside that folder. And I just, if I see the blue dot, it means that like there's something new for me to do. For Cloud Code, I used to use Warp. Now I actually, I'm trying to move away from Warp and use their desktop app because it's gotten quite good. And what I do is like for all the key open things that I have, I don't like how Cloud Code organizes their folders in their desktop app, because it's like kind of, I think you can organize like Codex, but I just don't really like how they do it. So I just pin the top ones that I'm working with, just label them as like insider loops. So I do IL colon, and then I have all of them kind of like together. And they're like, okay, that's insider loop. So basically like similar concept to what Kyler is doing, like stream of work. And I try to like put them together next to each other, but I cannot see all the work at the same time. And so like, I have like, you know, 10 different like desktop, like, you know, desktop windows together. So I can only look at one kind of agent at the same time, but I use like the notification and like the badge to know, hey, I need to pay attention here. I would love someone, and I've found a couple of tools that start to do this, but they are very clunky. I kind of feel like an infinite canvas would be the ideal where I could have windows and I could group the windows together. I could see them all working, but I could scroll around and kind of like go check on this fleet of agents, then go check on this fleet of agents and have it laid out in front of me. Like I'm looking at my desk, I see all my papers, right? I want to be able to look at this and see all my agents and zoom in, work on one thing, jump over to another thing and have those like multi-threaded concurrent work happening that if someone could build that, I'd be very happy. Yeah, it's almost like, you know, like a security guard that has like, you know, is watching like 40 cameras almost, but instead of like having like one camera per screen, actually you have like dynamic canvas and you can decide, actually just zoom in on like this six cameras because that's actually like the most vulnerable points at this point in time. Yeah, I agree. I think that that would be a good use of time. At the same time though, like, man, I get exhausted when I have like six different windows with different chats. I just get so distracted. So like, in a way I like when like these apps are like forcing me to like focus because otherwise I'm like, my brain just breaks. Like, I just feel like I get like tired way faster when I have multiple windows because like the context switching and like the babysitting just like gets like really exhausting. I really believe that one of the key differentiators that it like, you know, people talk a lot about how, as agents do more and more stuff, you know, either work gets easier or, you know, people aren't needed as much and what skills are going to be important. And I think the number of agents that you're able to concurrently keep going in your brain at any given time and your ability to context switch is going to be a key skill differentiator for folks, you know, in a couple of years that because there is an upper limit on it. Like, and everyone has, I've talked to people who say I can't do more than one at a time. Like literally, I just, I need to focus. I can't do it. And then there's other people who are running, you know, 20 or 30 at a time and they're jumping between all the ones. And it's like, and then people talk about the fatigue that comes with it, right? Like it is a very real tax. And I think that's just like a thing that we're going to see as a kind of like attribute that becomes a key differentiator in terms of, you know, where people route in their jobs and their ability to be successful in them. Yeah, I agree with that. I think I'm like somewhere between the two of you maybe on this. Like I, well, like I feel like I get absorbed into a given chat session, as long as it's like, there's a game of ping pong being played. But then as long as, as soon as it starts to take more than a certain amount of time to get something done, I find myself trying to fill that time with like another thing. And that, the subconscious decision of what to do with that time, unfortunately sometimes devolves into like, go to check LinkedIn or check, you know, X or check email or check Substack or whatever. Like all these like little fast food things, like little snacks, instead of biting off something meaty. Cause I know I'm not going to be able to bite off something meaty in like the two minutes while Codex is cooking. So for me, like that's like the thing that feels most, most, I would say tiring, but it's almost like there's a loss of momentum that I feel when that happens. I don't know if you guys- You just gotta start doing pushups, dude. That's the secret. Bro, are you trying to troll me right now? Cause you know, I'm injured. You don't think I want to be doing pushups? I probably can do pushups, by the way. I probably, that's probably a limiting belief I have right now that I can do pushups. I could probably do- While I'm talking, I don't know if you're, drop down and, you know, give us 10. Codex should tell me. I wonder if that's a setting. I could tell Codex what to tell me to do while it's cooking. Be like, bro, I'm about to be cooking for three minutes. Give me 10 pushups right now. Come on, let's go. Yeah. Yeah, that's a low hanging fruit. Also get it set up so it can run for longer is another really key thing. Like there are times, you know, you're going back and forth and back and forth, but I find that I'm trying to think about how can I set the agent up so that it can run for as long as possible without me? So I'm not bottlenecking it. It's basically the same thing that applies when you're talking about like human employees, right? You're managing someone. You want to be able to set them up to be autonomous and be able to make decisions without you. And ideally that comes, you need to create the alignment with them so that they understand how you're making decisions and be able to take that same decision-making framework and apply it as they're making decisions. I think when I feel the need for the most rapid back and forth is in the planning stage. Yes. Like trying to get the scoping of something right or trying to get all the details in a plan. But then, you know, once I have my plan figured out and I just want it to go cook, that's when I'm like, yeah, if you can work whatever, five hours without checking in with me and just take care of all this shit, like amazing. And it is getting better and better and better at that. Like I've had my Codex cook for like 15, 20 minutes without talking to me. And then it does the thing and I'm like, woo, amazing. Like that was like real leverage. But- The longer you can get it to cook, the more concurrent sessions you can have going, right? Because if you have it every couple of minutes, you can't maintain more than a couple agents at the same time. But if you can get it going like 30, 40, 50 minutes, maybe even longer, right? If you're doing a long horizon task, maybe you get it. maybe even longer, right? If you're doing a long horizon task, maybe you get it to deploy a swarm to tackle a bunch of things in parallel. You might be looking at like a couple hours or overnight. In that time, you can do a bunch of other things. And I think most people, you know, given, okay, you have five minutes between meetings, what, you know, can you knock out an email or two, right? It's that same thing of like, all right, what's, what can I pull from my to-do list of things and knock it out really quick. And, you know, you have different sessions that kind of are going for different lengths of time because they're different size. Yeah. I think I got burned pretty early on or not that long ago where I was trying to kind of parallelize some work in different agent sessions. And like, apparently they ended up trying to touch the same files. So like the work that one agent did, like was kind of invisible or not legible to like the work that the second agent was doing. So I've kind of like over-indexed on the side of like, never, I'm almost too conservative now. Like I never have multiple agents running in parallel if they might touch the same files. I'm so glad you mentioned that because that I think is a really key thing that it touches on two pieces. One, work trees are awesome. Work trees are basically the idea that it, in the background, it can clone a repository, make a sort of like quick copy of itself. And then the agent works in its own repository. And then at the end, you can have it like delete that repository once you merge the PR. And so what that... It's just branching. Yeah. It's literally like the mental model is just like, oh, it's making a copy. It's working in that copy. And then it's deleting the copy when you bring it back into the main branch. That enables you to have agents working on things that would normally touch each other. And yes, I got burned so many times. Like it gets chaotic and you end up with a complete mess. And untangling is genuinely a nightmare. And this is where my other plug, which is build all of that into hooks. So this is why I use Cloud Code more than Codex. And why I use Cloud Code way more than cursor, which I barely open anymore, is the ability to easily define deterministic code that runs at different points in the agent and chat lifecycle. So you can have hooks that trigger when you launch a new session. You can have hooks that trigger before or after it makes tool calls, before or after it responds to you. And the way that I have it set up is... Yeah. Can you give maybe a few examples? Because I think I'm having my hard time rubbing my head around like, okay, what is a good use case here for a hook? Okay. So some examples. I think specifically the easiest one to talk about is the work tree one. So as soon as the agent tries to make a tool call that runs the bash command to open it, make a new branch, what it does is the agent is nondeterministic in nature. And so it's very likely it's going to forget something. It happens all the time. It's like 50% of the time you get it to do something right and 50% of the time it does it wrong. And the thing that you can do to help prevent that is make it so that when certain conditions happen, there's a bit of code that is triggered as part of the hook to actually go and make the work tree and force the agent. There is no way for an agent that is running in this harness to build any code, write any code to any repo, including its own, without making a new work tree. It is impossible for it to do that. And that creates such a clear audit trail too of all the work that happens. Yeah. Exactly. And then at the end, there's another hook that says, okay, you've merged the PR. It looks for that key signal that you've committed everything. The PR has been pushed. And then once it's merged and it will tear down the work tree. So you keep everything nice and clean and you end up with a closed loop where you wouldn't... I forget it's even happening at this point because it's all behind the scenes. But I think this is something that people mistake a lot, where they assume that running an agent means that the agent can do everything. But we're talking about a non-deterministic system that makes mistakes like 20% of the time. And so you need to add those hard guardrails in and deterministic guardrails, not just an agent checking another agent's work, but code that's running, that's controlling how the agent behaves so that then it's impossible for the agent to fail. Again, it's that safety net thing. You make it so it's impossible for the agent to fail. And then you don't end up with your agents colliding with each other and causing the whole nightmare. I think there's a whole lot of other things you can do with hooks, which are really cool to be able to make it, whether it's automatically get pulling the latest from a repo, or it's applying a, maybe you're just like, I'm sick and tired of em dashes. And I want to make sure em dashes never appear in any of my outputs. And I tell the agent so many times, I don't want em dashes. You can make it so it automatically detects any output with em dashes and does a find and replace. And we can't expect these non-deterministic systems to get it right 100% of the time because they aren't designed to. But what you can do is add those deterministic rails in to make it so it's impossible for them to fail. I'm kind of itching to see how you work, Kyler. And I like, especially like, so like, again, like for people that are maybe listening, like you have this iterm window with like what you call like maybe like the different work streams. So you have the PM work, you have exec work, you have start cloak, which is basically like the, your, you know, weekends and nights project management, which is people management. And then you have like, you know, working on PMAI or maybe just using PMAI to help you do your job. So and maybe, I don't know if there's like a use case that comes to mind that like, it's pretty cool that you're like, okay, like this is awesome. Like I couldn't live without it or just like, and I'm like, super good to just go like maybe end to end of how you use it. And, and also just quick note, for those paying attention, I see that you have bypass permissions on, which is pretty savage for exec work. So we can get to that at some point, but yeah. What use case would you start with? I am just trying to think. I think that this is, might be a time where I stop screen-sharing, because it's hard to do that without going a whole lot of like internal stuff. But I would love to kind of talk through maybe some very specific ones that I think are particularly powerful. What do you think, Ben? Yeah. Okay. So I think, yes. Yes. All right. So I think that there, you know, I want to try and stay away from coding because I think that it's too easy to like, these are coding agents and everyone talks about that. And so I think that a case that is maybe more interesting, I'll use like preparing for a one-on-one. So, you know, a lot of times you might keep a document with your coworker where you both put action items into the one-on-one ahead of time. And then when you come to the meeting, you have that ready to go. But you know, the reality is, especially for people that I don't, I don't have weekly one-on-ones with me, I'd biweekly or monthly for someone who's slightly a little farther outside my orbit. It can be a little bit hard to, one, always remember to put those things in there. And then also to, you know, actually remember all the different things that might come up that I want to talk to them about. And so I have a skill that is, it's just a like one-on-one slash one-on-one. And that's another thing is like, definitely use the heck out of skills. Once you find a workflow that works once, go and, you know, at the end of your chat, say, oh my gosh, this was great. I need you to make a skill that does this repeatedly, because that's the way you can kind of get the same result multiple times. And so what it does is it goes and it goes and looks at every Slack interaction I've had with that coworker. It goes and looks at my granola notes from every meeting I've had where that coworker has been in that meeting. It goes and looks at maybe the tickets that they've worked on or the commits they've made. If they're in engineering, if they're not in engineering, maybe it looks for all the Confluence docs or all the Google docs that have been made. Basically try to pull all that context into one place. And then once you have all of that, it's personalized to you and that other person, then you can have it pull out the things that are particularly relevant, give you a snapshot of what they're thinking about right now, what was relevant last week, and get you ready for that conversation. And you get the far, far richer and more actionable one-on-ones because it kind of does that hard work of pulling in all the context that not just the things you would see and know about and think to like write a note about ahead of time and be like, I need to talk to this person about it, but also the things that maybe you don't see. I think that's where these systems, especially if you get to a certain scale, it's genuinely hard to be able to see every part of the business at any given time. We just don't have enough hours in the day to be able to look in every channel and pay attention to everything. And so having it be able to pull out the relevant context from those different dark nooks and crannies of whatever system you use and pull it all into one place is really, really helpful and can make for just a much richer conversation where you're able to get a lot more kind of depth. Oh, sorry. I was going to say, do you share that document with that person or is it just for you to prep and to pull? Is it like a shared resource or just for you? It really depends. So in the cases where I need to have a conversation and I think an artifact might be able to help to structure that conversation, I might take the output of that. I'll say, make a Google Doc for me. And then I'll use that Google Doc. I'll share that with that person. Then we can talk about it. But in the cases where I am just trying to kind of get the download, I might just pull it up and I'll have it up when we're meeting and I can kind of look through it. Or I might jot down a couple of notes in our shared Google Doc ahead of time. So it depends. But most of the time, I'll say probably like 75% of the time, I'm just kind of keeping it in the background. It doesn't end up surfacing. It's just my own way to jog my memory. One other thing that... Oh, sorry. One more question. One of the things that scares the shit out of me about this new world is leaking some context that I don't want to be leaked at the company level. For example, let's say you were fundraising and you have some notes about an investor meeting that didn't go well and you just don't want your entire PM team to see that context. Or maybe there's a performance-related conversation. I know about get-ignore, I know a bunch of things, but it still freaks the living heck out of me. What if all of a sudden I just have something that doesn't say something nice about someone or have a thought about someone and then that leaks into the entire company? What are some of the guardrails that you've said? Because that feels pretty scary to me. Yeah. I mean, I think like anything with... Again, I'll go back to this. When you're dealing with a system that is probabilistic, the output is it's just a percentage chance. And you're always dealing with the percentage chance that it was right, and then your own comfort level with that percentage chance. But you can never eliminate it completely. And so I think about it like the way that we would think about a security model where you want to layer things. So like you said, get-ignore, I have a local work folder that agents are prompted through the skills to work in when it touches something confidential. Again, you mentioned investor relations. That's a perfect example where anytime I do something with that skill, it does all of its work in the local get-ignored folder. But then on top of that, there's two other layers that I have. One is in the same system that auto-reviews PRs for me. And it enables me to build a ship without needing a human review. I have a review mechanism that looks for anything confidential and it will hard flag and instruct the agent that made the PR, you need to cut this immediately. Scrub it from history. We don't want that in there. And there's hooks and things in the system that would check for keys or check for literally confidential information. But I'm talking about things where it's more like, I don't want that there because it would be really bad versus I don't want that there because it's a security risk. So in that latter or in that former of the cases, it's actually watching out for me to make sure that I don't accidentally deploy something or put something into our shared knowledge base that is on everyone's computers that could be considered sensitive. And the last thing that I have is a nightly agent that runs through and it's called the librarian. It does a whole bunch of different work to kind of clean things up, prevent semantic drift and kind of the decay of knowledge files, which happens when you're building a system that you don't look at, agents look at it. So it's very easy for it to kind of like start to decay or drift away from its original intent. And so the librarian does that kind of like cleanup work. It also checks for as like a third layer checks to try and find anything you don't want out there. And then it'll ping the person in Slack and say, hey, I noticed that you committed, it just uses get blame. But it's like, I noticed you committed something that maybe you don't want to. There was a funny example. It was early on after I introduced a new PM to the system. And I noticed that they made a 50,000 line PR. And I noticed it was rejected. I was like, what on earth could they have been doing that they made a 50,000 line PR? It's common for big PRs because it's basically just documents, right? It's not like it's code. But 50,000 lines is a lot. And I realized that they'd actually committed a large section of their home directory on their computer, which included chat transcripts, like from talking to the AI. Like one-on-one granolas or something? Exactly. Yes, exactly. So that was a good example where it got hard rejected. As soon as the PR went up, it said, don't do this. And then the librarian caught it as well. But fortunately, the next morning, I was like, hey, we need to huddle really quick. There's something that you need to know. There's a good learning opportunity. And I think in that scenario, it's not like there's anything for that person that was really, really damaging. But I think that it's just the kind of thing where it's like, it's best practice. You don't want to accidentally do that. And so having those multiple layers is really important because you can't trust any one layer completely. Yeah, I love that librarian agent. That's a great use case. I have one for myself. It's called the chief memory officer. I love that. It does the same thing. You're reminding me of Inside Out in the Pixar movie. Every night, they sort out all the memories into what goes into long term storage versus what core memories and stuff. It's exactly that. And I don't have one of those. Man, I feel like I'm a bit behind on you guys. But anyways, I'm working my way. I'm very one step at a time with all this stuff. So I appreciate when I get to see people that are several steps ahead of me. We could keep jamming on this for a long time, but I know we're coming up on the end of the hour. So should we start wrapping up? Let's do it. Cool. So I'm sure there's a bunch of other stuff we could have gone into, Kyler. So if there's things you're excited to chat about, we could talk in Slack and maybe find time for a part two at some point. But if people do want to kind of learn more about what you're thinking about, they want to follow along, where would you send them? And then secondly, is there any way that the audience, which is mostly senior product managers, product leaders, can be helpful to you? Yeah, absolutely. So I'll take the second to plug. If this way of working seemed interesting to you, we're growing very quickly and we're hiring across a number of roles, including product and engineering and design, as well as many other disciplines. So please check out our careers page and also ping me on LinkedIn. I think that's the place where I would love to be able to interact with folks. My name is somewhat unique, so you should be able to find me quite easily. And I think that I'm always very down to hop on a quick call and chat through things or just chat in DMs. I think being able to hear from folks about how they're working or answer questions or learn about how we might be able to work together in the future is always really exciting for me. Sweet. And we'll make sure to link the careers page to the episodes so people can find it easily. But yeah, really cool opportunity. You guys are definitely kind of on the cutting edge of how you work. And I feel like I love the internal experimentation culture that you have. And a lot of data has been brought to you. So kudos for you for being that kind of agent of change and getting everyone excited to work that way. And now our exciting gratitude corner. So Kyler, you've had a really awesome journey in product now with Cloak. You've seen the company grow quite a bit over the last couple of years. So I'm curious, is there maybe one or two people that you want to maybe say thank you to for the role that they've played in that journey? I definitely do. I have two people, one of whom I talk to very frequently because they're a good friend of mine. And the other who I actually haven't talked to in a little while, but they played an instrumental role. And I figured now's a good time as any to be able to give them a shout out. So the first is my good friend, Michael Carafa. He is a fellow product leader and a very dear friend. And he was the person who really helped me to get into product quite a while ago. We've known each other for a really long time. And I remember literally helping me through the interview process. Very, very, very helpful in that way. And then throughout all of this, I've just learned so much from him, being able to discuss and experiment with different ideas and be able to latch on to his knowledge of all these things. He's very knowledgeable about everything. So it's really helpful to be able to have someone in your life to be able to learn from like that. And the other person is Ashima Kapoor, who was my first ever manager when I was a junior product manager at a company called Context Travel. And she really has been the role model for what I imagine a good manager should be. And ever since I've started managing people, I think about her very often as someone who I tried to be like Ashima because she was an exceptional person to be able to learn from and really, really a wonderful human being. What an awesome way to finish. And yeah, shout out to Michael and Ashima. I love finishing the episodes that way. So good. Yeah, shout out to both of those people. I hope this was helpful. We will link to their LinkedIn in the notes if people want to go check them out and see what they're up to. And yeah, thank you for coming on. It was really fun. I really enjoyed learning about your way of working. And hopefully, the message I personally take away from these, and I hope others do the same, is that there's not one right way to do any of the things right now. And a lot of different people are exploring and tinkering and trying what works for them. And I love gathering these data points about how different teams are working. And some of it's going to be relevant to me, some of it won't. But I just really appreciate you taking the time to walk us through how you've gone about it. It's such a fun period of time. I love this. To be able to be building stuff in a time when there's so many options on the table, and there's so many chances to experiment and make something better, it's a blast. So yeah, always happy to talk. And I'm really excited for folks to be able to hear this. I hope it helps some folks. Amazing. That's a wrap. We appreciate you, man. Bye-bye. Thank you. Bye.