Overview
This episode explores how Zero Gravity—a UK platform helping disadvantaged students access elite education and career opportunities—is building an AI “career co-pilot” to close the gap between knowing what to do and actually doing it. Teresa Torres speaks with Elliot (PM) and Dan (engineer) about designing AI that orchestrates existing human-led support (mentoring, community, learning pathways, opportunities) rather than replacing it, while maintaining rigorous safeguarding for 16+ users.
The conversation moves from product strategy (where AI helps) to implementation details (context management, tool calls, moderation, and early evaluation practices).
Key Takeaways
Zero Gravity’s core challenge isn’t lack of student ambition—it’s lack of visibility into what good looks like and the confidence/network to act on it. Even when students receive excellent advice from mentors, social pressure, imposter syndrome, and unfamiliarity with professional norms can stall follow-through. The co-pilot is designed as an “orchestrator” that nudges students toward the most impactful next step across the platform, not as a standalone chatbot.
A counterintuitive learning: an early, subtle AI feature embedded in a job card (structured suitability summary) was “nice” but didn’t create a compelling “co-pilot moment.” Hiding the “LLM magic” reduced perceived value; students needed more interactivity to understand personalization and feel empowered to dig deeper.
On the technical side, the team intentionally avoided overengineering (e.g., vector databases) early on. Instead, they leaned on data they already had, classic search-based retrieval, and strong agent design: tool calls, structured outputs for grounding, and careful context-window management. A major insight is that building useful agentic systems is often less about exotic architectures and more about disciplined context hygiene—summarizing history, removing irrelevant tool outputs, and only fetching what changed.
Safeguarding is treated as foundational, not a bolt-on: moderation of inputs/outputs plus a second external safety layer and human oversight with full observability.
Practical Steps
Teams building AI copilots can apply several concrete practices from Zero Gravity’s approach:
- Start with a real workflow “door,” not an empty chat box. Embed AI entry points where users already act (e.g., on an opportunity card) and provide starter prompts to guide safe, relevant conversations.
- Build orchestration before automation. Use AI to route users to existing high-value tools (mentors, learning, community) instead of trying to fully replace humans on day one.
- Treat context management as a first-class feature. Summarize older messages, remove irrelevant tool calls, and prefer “diffs” (what changed in a user profile) over re-sending entire documents.
- Control tool availability via application logic. If data is stale or unchanged, don’t fetch it; if a conversation runs long, re-enable a tool call rather than bloating the prompt.
- Design for trust and latency. Stream responses, show “thinking/tool use” states, and preload quick-win content while heavier reasoning runs.
- Operationalize safety. Moderate every turn, add a secondary safety provider for deeper categorization, and ensure staff can review interactions with clear escalation paths.
- Begin evals with a failure taxonomy. Tag real conversations for issues like hallucinations, stale recommendations, vagueness, and tone—then evolve toward datasets, code-based checks, or LLM-judge scoring.
Notable Quotes
- Elliot: “We’re trying to build an orchestrator, not an automation tool.”
- Dan: “We had the power in the data that we already had… [it was] about giving the LLM the tools that it needed to keep coherent conversations going.”
- Elliot: “We probably hid the LLM magic a little bit too much.”
Full Transcript
Welcome to Just Now Possible with Teresa Torres. Hi, my name's Dan. I'm a software engineer at Zero Gravity. I've been in software for about just over 10 years now. I started in the sports tech space and transitioned over to the educational space. I've been at Zero Gravity for almost four years now, and most recently have been building out our new career co-pilot feature for the past three to four months. Hi, I'm Elliot. I'm a product manager here at Zero Gravity. I've also been here about four years. I think I joined a few weeks after Dan actually. My background is primarily in operations and CX, a variety of different UK-based startups and scale-ups. And yeah, most recently I've just been taking the reins on this AI focus from Zero Gravity side of things. So tell me a little bit about what does Zero Gravity do? And then we'll get into your AI product. So Zero Gravity essentially helps disadvantaged UK students access elites, opportunities, and careers. And it came from a dream from our founder, Joe Seddon, essentially, because he lived that experience of growing up in West Yorkshire from a single-parent household. And he felt firsthand how hard it was for students like him to break through those barriers. So when I say break through barriers, that's what it all boils down to. We're trying to break down those barriers to access and to network. We're partnering with lots of different state schools and universities and creating what I think now is a, I think we'll call it a B2B2C product, which essentially allows schools to refer students, and then corporate partners will buy into our pipeline in order to tap into that talent pool that we're creating. So that, in a nutshell, is what we're doing. I love this space. I will share that I am a product of a single-parent home. My family skipped a generation with college, so my grandparents went to college, but nobody in my mom's generation went to college, and then I was the first of the next generation to go to college. And I got very lucky in that I got accepted to Stanford, and it was probably the biggest impact on just my trajectory and what became available to me. So this is definitely a problem space that resonates with me, and I'm excited to get into it. Nice. Likewise. Okay. Tell me a little bit about your AI product. What we're doing is we're building what now we're calling an AI career copilot. The name is something that I was apprehensive to use originally, but copilot in the actual, what it is doing for our students, is very apps, I think. So ultimately it acts as an orchestrator, which we are saying bridges the gap between knowing and doing on the platform. And now what I mean is it analyzes the student context, understands what they do on the platform and what their ambitions are, where they currently study, for example, and it helps them ultimately execute the most impactful next step within that platform space. So what we're doing at Xeroft at the moment is that we've got the kind of career tools already in action. So we've got a community space. We originally started off as a mentoring platform, one-to-one mentoring, all proprietary tech, in terms of what we're doing. And that's grown out into what I said, community, live masterclasses, on-demand learning pathways as well, and an opportunity space. So they can be matched with different opportunities that our partners have, such as grad schemes and things like that. And whilst we have all those tools, what we essentially found out is that students don't quite know what good looks like. And that is something that might sound quite obvious, but it's something we had to understand because a lot of students can be a different scale. They can be very switched on and attuned into what they want to do, but that doesn't necessarily always mean that they know what the best next step of that is. So let's say they want to tweak their CV, and they want to upskill in certain areas or certain skills. For them, it's always they're being told what that could potentially be by a teacher, by a job spec or whatever. And they're the lucky ones, right, because the others don't have a clue. And that's what we need to help them. This is something that I'm not sure all my listeners are going to be aware of. And so I want to just expose a little bit of this. And tell me, I'm based in the US. There may be some differences with the UK. So let's explore that a little bit. But I know for me, I didn't know what I didn't know. So I didn't know until I got to Stanford and was exposed to other people from different socioeconomic classes. I didn't know about the types of jobs their parents had. I didn't even know those things existed. I didn't know, oh, people take classes to get good at the SAT. I know that's a US-based thing, but I think it will translate. I didn't know that was available and that people did that. I didn't know that your summer internships are really critical to set you up for your job after college. There was nobody in my family that had this knowledge to share this with me. And I remember even one moment in college, somebody was reading the stock market section of the Wall Street Journal. And I didn't even know what the stock market was. And I remember being like, wow, I'm just at a very different starting point. I didn't get exposed to conversation about the global news in my home. I had huge gaps in just my knowledge about the world and how the world worked and how college worked and how jobs worked. And it sounds like that's a lot of your comment about the knowing-doing gap. First of all, you have to know what to do. And then second of all, you might know I'm supposed to go take an SAT class, but do I have the means to do it? Do I know how to find it? Do I know where to go to do that? Is this kind of along the lines of what you guys are helping with? It's very similar. Everything is resonating. And we're seeing a lot of similarity in the U.S. and that kind of disparity. We focus a lot on that network disadvantage for our students. And there are stats around you're more likely, you're X times more likely to know somebody that went to Oxbridge or is working as a lawyer or XYZ if you went to private school compared to state school. And that is something that in the U.K. has been a consistent kind of topic. And you always hear about it growing up and when you experience it as well. When you hear stories like Joe's, for example, and you get that severe imposter syndrome when you do finally make it, it's doubling down, isn't it? I put all this hard work in and looking around me, all these people have known somebody. Maybe they knew they had a family member who was able to recommend them. Maybe they had the finances to fund them to, as you say, upskill quicker. And it's something we are currently working on, especially in the tech space as well because there's a whole other conversation that we could be having around the tech and AI divide as well, which we can get to a bit later on. This is actually why I got into educating on AI because I see that divide already and it's annoying me. Okay, so let's get into this. Tell me, before we get into your co-pilot specifically, give me a sense of what does Zero Gravity do to help folks with these backgrounds? What's sort of the overall service that the company is providing? And then we can use that as a foundation to get into how is it helping. What we're doing is our very core focus was mentoring and it's the power of mentoring. So students would log in. They'd either be referred by their school or they'd log in or sign up organically. In order to be able to access Zero Gravity, we basically have an algorithm which will determine the lowest areas of opportunity within the UK and it will grant you eligibility to the platform, essentially. So by doing that, you're then in that kind of catchment group of, oh, look, you've got the highest opportunity here, you've got the highest potential and we're here to help you with that. And with the mentoring process, it was matching them with students who were in their kind of ambition areas. So let's say that you wanted to do mathematics at Oxford and you're a state school student and you sign up for Zero Gravity and you're already low on confidence. You don't know anybody there. You're probably the first in your family to apply to this university and you enter, you're matched for an algorithm with a mentor and we're doing the same thing as you. And you have these sessions and it's just something that we've built on from there, really, that sense of mentorship and upskilling of having somebody that's been in your position and kind of the ranks to broken down the barriers to get to university and then having that conversation with them. And I think I want to make a good point about this, is that human element has been really interesting for us because it's, oh, look, I can see myself in this person. And with that, we've seen some really good success of actually mentoring people, then getting to university and then becoming a university mentor themselves as well. Yeah, this is great. So you're starting with high school kids. And what I love about this is that, like, I remember being a high school kid. I didn't know how to pick a college and my parents didn't know how to help me pick a college. And I remember, like, applying for financial aid and I just did it on my own. And it's funny as an adult, like, I see my friends, like, parents are really involved with their kids and choosing school, but I didn't have that. I didn't even know that was a thing. And it's, like I said, I just feel like I got lucky. For whatever reason, I figured it out, but how many kids don't? So I love this. So you're starting with high school. You're pairing them with a mentor. Somebody who's been through it before can show them the ropes because maybe they don't have access to that at home. Now give me a sense of where does AI play a role? How does the co-pilot work? I feel like just one small thing to add there is the safeguarding aspect of our platform as well. I think one of the key pillars of Zero Gravity is that we provide a safe space for those things to take place, which can't always happen on unregulated platforms and stuff like that. So we're dealing with kids who are of the age of 16 upwards. That is, like, a key pillar to what we're doing and what we've tried to bring into our career co-pilot feature as well. So in essence, the career co-pilot feature is something that knows you. So it knows data about what you've done on platform. It understands what your career goals are, whether they've changed recently or not. It understands what kind of interactions you've had on platform, whether you are in mentoring, whether you're more leaning towards masterclasses, or whether you need, like, that networking help, in which case it would prompt you to go on to community and help you engage with other like-minded people in the community. I think with career co-pilot, the main differentiation here is, as Elliot mentioned, that we're trying to build an orchestrator, not an automation tool. So in that sense, if you imagine, we've got all the tools on platform, so we know that if people engage or our students engage with mentoring, they're more likely to get their first choice university compared to the UCAS standards. So we knew that we understood what the outcomes were. It was just about trying to get AI to orchestrate that and put them into the right directions. And I think the key thing throughout building it was, like, how do we provide a service, an interaction layer, in which the user can generate and keep momentum on platform? What I love about this is you had a model that was already working, so you know that your mentors are helping. When you first started building your co-pilot, were you trying to augment what your mentors were doing? Were you trying to scale what your mentors were doing? Like, what was your initial seed of, if we build a co-pilot, it'll do what? Well, we had ideas of completely automating mentoring, having an LLM with a live kit, synthetic person, avatar, whatever you want to call it. But, yeah, I think we had quite some wild ideas, didn't we, Elliot? And I think one of the keys to when we began was trying to bring that back down to earth and trying to understand what the users actually need as opposed to what's cool. Yeah, as we've mentioned throughout the chat, it's more discovery giving users the tools and ability to develop that momentum themselves. I can imagine with a co-pilot, especially if the goal of the co-pilot is to simulate what a mentor is doing, that's a big footprint. So tell me about the earliest days of the co-pilot. Like, how did you decide where to start? How did you get feedback? Was there an early prototype? Just what was the beginning of this? I think it was a bit of an analysis paralysis, especially from my side, in terms of where the market was at the very beginning of this, because we've had our finger on the pulse of this since you could possibly do it. I think we've had a bit of a beta around Okanaka, the very early AI coach, when chat2BT APIs first came out. And we're very quick to work on that and work through that. And I think we intentionally did that in a restrained way because we've always been very nervous, as Dan's mentioned, around the safeguarding element. Overly nervous, absolutely overly nervous. And I'd like to keep it that way, to be honest. But it's something that we've always been very excited about. But it's like, how do we make sure that we can scale it and use the best tools possible for it? And I think when we came to scoping out a co-pilot and finding out that this was a problem, I think essentially because the data was telling us that people were getting stalled between, let's say, between action, different tools. They do certain tools. Let's say, for example, someone said, I know what I should be doing more, I know what to be doing specifically, in terms of interview feedback. You're saying, I finished XYZ course, but now what? They know what to do. They have a really good conversation with their mentor about a particular opportunity, but that never applies to that opportunity. So we were trying to break down the barriers of different elements. And what we did find is that it was the same problem that we were focused on at the beginning, is that these students are very driven, but they are, I do think, there was that element of, look, I need to keep this momentum going. I need to quite understand why I'm fitting into this space. That mentoring relationship was, for them, a very positive experience. But it's like, why would I go on and take that into a masterclass? Why would I go and engage the community beyond this? Because for them, I think it's quite a big step to go beyond that initial place. I think the thing that resonates with me about what you're saying is that you can have a mentor telling you this is the next step, but if you don't in your life see people taking that step, there's a lot of imposter syndrome, a lot of doubt. You might even have family members telling you not to take that step. I think this is, for people that haven't experienced this growing up, I think it's a little bit hard to relate to it, but I went to high school with kids where they were told, don't go to college. You're not good enough for college. And so I think that's, I can definitely relate to, they could have a great conversation with their mentor and they're like, yeah, okay, I know what to do. And then they don't do it, because there's a lot of competing forces. Yeah, for sure. It's one of those ones where even when you're working towards something, as you say, you'll just be constantly pushed back. It's even teachers as well, because as you say, teachers, sorry, people, they'd be in school and people would want to apply for something. And a lot of state schools, I've had family members that have gone to state schools and been very talented enough to apply for opportunities, and it literally gets to the point of that before the exams have been told, just rein it in a little bit. Don't set your standards too high. So that can definitely resonate with that in terms of how you're feeling. And I think, Teresa, the thing that we were focusing on at the start was that we did approach different ways around this, like progress trackers, like nudges, or recommended journeys, for example. It could have been different routes towards this, like elements of gamification. But what we did realize quite soon after the conversation is we speak to students every week, and there's something within Zero Gravity, like everyone has to be a mentor on the platform, which is a really nice tradition where we sit down and we mentor students. And it's a fantastic way, from my perspective, to have regular conversations, but also for everyone else, it's just to test the platform and also make sure that just hear these members' stories, because it's so easy within the, I think, in our very data-hyper-focused and, I guess, more technical space to get lost on the data and forget about these amazing stories that every individual member that can get goes through our system, and then, because at the end, it's like incredible. And I think that's something that was another reason we went down the AI route, is because every story wasn't linear. Everything was unique. Every ambition was slightly different. And it's this, like, context. I mean, like, there's a lot of sensitive stuff, obviously, we will not share with LLM, and stuff that really makes them. But when you're looking at where they are, where they want to go, and the kind of that additional concept piecing it together, it's very different per student, or kind of where they're coming from, what school, and what destination. That's something that was really exciting at the start, but that's also when, going back to Dan's point, we were like, well, what's the coolest thing we could do here? Like, how far do we push this? Because we've got super talented guys like Dan, and we've got a great kind of squad of engineers who are very passionate about using AI. We had to be very careful not to get carried away by the hype. Yeah, for sure. So tell me a little bit about what was that starting point? What was the... There's a phrase that's been coming up on our podcast of, what's the first bite of the apple? I'll embarrass myself by saying, like, how I started it, and then lead through to Dan, who grounded me in the process. I think, in my naivety at the time, absolutely, and just looking through, like, the different options that we had, I was like, look, I went down the Microsoft Azure route, where they had this very interesting setup to, like, basically host vector databases, like, RAG architecture, and the different LLMs. It was really interesting to see, like, the different elements we could use for this, and that's where I was first introduced to all of this stuff. I was like, this must be the way to do it. This must be the best approach to building an LLM for, as you say, like, a problem space as vast as our size. And I definitely just got a little bit trapped in that mindset of, look, this is, like, the big picture stuff that we need to do. This is the deep, like, more deep technical things that we need to do to build this. And I had to kind of rein it in a little bit, take a little step back. So it was a really good... I'd recommend any PM, anyone building AI, LLM, like, anything, to go down that route and go down a complex route and then work backwards, because that is, for me, has been a really good journey. And I feel like now, looking at what we're building, I'm like, look, this is fantastic. This is what could be ahead of us. It's like just doing a little bit of extra discovery on that further route. So I had to go down this route, then talk to Dan, come to the realization that we might not need all of this, like, kind of fancy stuff at the start, because we've got the data, but it's not that complex for what we're doing right now. If we structure it in a correct way, we can do this in an interesting space. And it was very similar to prototyping as well in the sense that Cloud Code, lovable, pure, like, addiction from my side. And I absolutely love that stuff. And I think, if anything, it was just, like, I had to bring myself in, again, because it was, like, the amount of things that you can do and how the opportunities and the speed that you can do everything, you have to be very careful to not back yourself into a corner. And I find that to be a very big trap with, like, lovable prototyping at the moment and, like, the ease of Cloud Code and why it's super important to have these regular conversations with not only your members and your user base to stay grounded, but also, like, engineers and just be like, look, this is what I'm thinking. This is where I'm going. It's so easy to sit down for an hour, end up in a certain space and need to be taken somewhere else because it's just moved so quickly. It's so effortless now. Yeah, I think you're touching on something that is, it's both really fun about where the world is and how fast it's moving, but it also can be pretty dangerous. I think we've had a lot of episodes where teams acknowledge they started with too much, right? We had one team talk about, they jumped right to an embeddings database and different RAG strategies, and then it ended up that keyword search actually worked better. And it's hard, right? Because this technology is fun and there's a lot to discover and to learn. And I think, Elliot, your story of just being really curious and jumping in is a lot of where, actually, the name of this podcast came from. Marty Kagan talked about this idea of you want your engineers involved in discovery because they know what's just now possible, right? And that phrase just resonated with me so much. And what really resonated with me was it's not just engineers who need to know what's just now possible. Good product managers, especially ones that work in technical spaces, also need to know what's just now possible. But then we have to be careful about not going too big, not going too complex. And any engineer knows the risk of overengineering before you need it. So, Dan, let's jump to your take. So, Elliot's super excited about AI and you have a grand vision of what you can do. What was your reaction? Yeah, I think, inherently, I'm a lazy and simple guy. So I wanted my job to be easy. No, I'm joking. Ultimately, I just wanted to see what we could do just in terms of context management and tool calls. I didn't think... I know RAG is great. I think it's the solution to the inevitable pitfall that the LLMs currently have, which is training on old data and extending its knowledge. But in our case, I felt like we had the power in the data that we already had. And it was more just about giving the LLM the tools that it needed to keep coherent conversations going about, for example, a job opportunity, and then giving it enough context about the user so that it could actually guide and suggest what the next steps could be for them. Yeah, so Dan, based on what you just described, it sounds like even your earliest prototype maybe was already agentic. Is that true? Yeah, for sure. Okay, so you started from the beginning of looking at it as tools and context, which is great. Tell me, what did that earliest... I don't even want to say V1. Maybe it was just a prototype. What did that look like? So it was a button on a card on a job. So we've got what I would class as a jobs board. And we've got jobs cards on the jobs board. It started as an entry point on the job card. So it would start off... We played around with what was best here, but at the time we decided we would start off with a structured output. So we had to build the tools firstly to get the job context and the user context into the LLM first. But then it was just about keeping it on track. It was just about giving it the right tools, the right logic to decide when to give these tools. So a good example is, for example, on the jobs board, if a job hasn't been updated or job details have not been updated since the last conversation, there's no need for the LLM to go and grab that tool. So we were actually able to do a lot in terms of the context management side just within the system itself and not go really convoluted with any rag or anything like that, because that would have been a massive undertaking for such a small team. It was just me and two of the devs on this project, and we had a three-month timeline on it. So help me envision, like, what was the interface for this? What did your students experience? So they would click on the job card. A kind of chat bot modal would pop up. It would generalize. It would generate a summary on the user's suitability to a job. So it would... We use a structured app for this because we felt being able to structure, for example, an overview analysis, match analysis, and then key strengths and weaknesses of the user's career profile, for example, is, like, better grounding for the rest of the conversation if we were able to do that. Yeah, so that was one of the first things we did. So the actual first interaction Is there desired profession like after college that you're helping them with? What do they need to do to get there? Yeah, exactly. I think one of the biggest things we've seen from the past four years is that we've got talented people. It's more about getting them to put in good applications. And that was like a bit that was missing. So we quickly built a career profile and platform because we felt that would give us more data, more grounding, more ability. Just more avenues in which we could use that data and that tool to attract partners or allow or generate momentum within the user's journey. And then from there, that was the key piece of information we wanted to take into the LLM to see what we could do with it. So a lot of the job was getting that data in a document format. So it could easily be pulled and ingested by the LLM when it needed. So that it could just, we didn't have it on the back and forth chat. But once we realized we've got the basis for a back and forth chat, we quickly just built that in and then put all the safeguards around it because we've had experience doing that before, basically. Okay, let's get into like just this beginning piece because I already have some questions, especially because you said there's no rag. So it sounds like you have a job listing or a career listing, like something that a student might be aspiring to. You've helped them build out their own profile. And you're trying, your first kind of prototype was, let's give you feedback on how well you match. But it also sounds like you're telling them, this is what you might need to do. You might need to take these classes. You might need to, who knows what else. Where are those recommendations coming from? That seems like you have to have that data to know what path the student should be on to be able to get that type of job. Yeah, so we do have some gold standards, but as Elliot mentioned, every journey is slightly different. So what we try and do with the learning and masterclass recommendations and the career mentor recommendations, we already have those, those, that data in document formats, and we've been running recommendations just based on a search just to, just to score and match people based on those two documents. So we just piggyback off that at the moment, but the, we are wanting to move into RAG in which we are able to embed those documents and then have full on. I feel like we have a terminology difference maybe. So you are using search. You're just not using like embeddings databases. Yeah. We decided quite early on that we wouldn't go down that route. Okay. I was trying to figure out how there was no search step, but there is a search step, but it's just not embeddings. Yeah. Okay. I gotcha. Okay. I would argue search is RAG. You're searching and adding things to the, yeah. Exactly. Yeah. Okay. Okay. So you do your very first prototype was, we have a job, we have a user's profile. We're looking at matching and then based on those, the gaps, maybe you're searching for, what do we recommend to this student? So that would include information about recent updates on the career profile, their recent job interactions. So what kind of categories of jobs have these people been looking at? And also just general interaction data about what areas of the platform they've interacted and not interacted with. Okay. Yeah. And how did that first prototype go? Like how did you, did kids respond, interact with it? I know you said there was no back and forth, but I can imagine that it was a lot of work. I know you said there was no back and forth, but I can imagine like is trust an issue, what do you mean? I got to take that class. I don't want to take that class. Tell me what was the reaction here? Yeah, sure. So it's an initial conversation. I think, again, I think there was a, there was an assumption from a mis-assumption from my side that we wanted to make this relatively subtle in platform, I've learned a lot from people building AI tooling at the time that it needs to be not like a, an additional tool, but more of an enhancement of existing tools that say, but it lives on a certain part of your design and it doesn't really get in the way, but it's a nice addition. And that's something, again, I think going back to my point around, we were being probably overly careful around this saying, look, let's just keep it in here. Let's not make it too in your face interaction. And during the testing, a lot of the kind of, we went into testing really excited that we know what it's doing, come through what Dan's just said. Let's have these conversations with students. And immediately I just wasn't sensing that kind of sense of excitement back. And it was kind of like, Oh yeah, this is cool. Thanks for appreciate your kind of recommendations based on this. There was that kind of excitement around like us, like having the context around them and their engagement. And that was, that was the kind of the big moment for them, but there was no, there was never a, Oh, this is an AI doing really interesting things for me here. A moment. It was just like, Oh, that's nice. Cool. Zero gravity have that. They've got a cool feature. So it came away from those discussions thinking we haven't really had that moment, that copilot moment that we've initially set out to solve. And through further conversations, it became more and more apparent that we needed to have more of a moment, more of a kind of like an interaction, more of a wow, that is that I understand one way you've retrieved that information from. I feel empowered by your analysis of this based on what I've done. And I'd love to dig deeper. I'd love to like, just ask more questions about this and discover a bit more about myself. And that's where we then came to, I think is close to the final version of it after those discussions. I can imagine like, from what you're describing, it sounds like maybe they didn't really realize it was personalized to them. And that like, you might need a little bit more interaction to like express your objections and let the LLM continue to tailor it to you. Is that a fair characterization? Absolutely. I think we probably hid the LLM magic a little bit too much, but I try to make it smart. So yeah, that's definitely something that we found out because I think the one thing I was also learning from having these conversations is that students use projects in like cloud interactivity. I'm a big fan of projects and, but they don't really want to be the ones managing the knowledge there. So in a way, if you think of our Copilot as a bit like a kind of like a project on top of our, within our platform, but we can deal with the knowledge stuff. Like you can, like my vision for this in the future is that they can potentially interact with that at some point, maybe like memory management, but take that element away from the user. And it's just really nice. They're feeling the power of projects without having to manage it themselves and then having things out of date, as Dan's mentioned. So that's what kind of what we were leaning towards. I think we didn't quite, we sacrificed a little bit of the chat interaction by going too far the other way. Yeah. So did you like, was your next step, you just opened up the ability to chat? It's not straight away. Again, we were, we did think that, look, this is the next best step, but we explored a lot of different things. Well, look, do we go voice input, outputs were really picking up. They weren't quite there. I think they are there now. I do think that we've seen a lot of really interesting stuff in this space at the moment, but not quite there when we were doing a lot of the discovery work around this, but it's just like, one, what kind of input output format do we want to go with? Do students actually want this? We did have, as I said, very early chat function on the platform, which was popular, but not enough for us to go, look, let's go fully down this route. Do we use, what kind of like approach to this do we use? Do we use kind of like Socratic back and forth? How do we embed safeguarding into this? So we were very apprehensive about going down this route initially. And to be totally honest, it was like, it felt like everybody was falling into the trap of AI chatbot, like AI back and forth. So we were like, look, we need to think about this properly. We need to, we've got the tools we've got. We've done a lot of discovery where we do understand this space now. Let's not jump to what everyone else is doing and create like a little LLM wrapper, but a lot of these kind of apps are now like ending up being. So we, there was a lot of kind of conversation around it. We did eventually come to the conclusion after a lot of user interviews and testing there that we're going to do text input output for now. Voice came with its own challenges and I'm still determined to explore that at some stage, but we, again, we were getting a little bit ahead of ourselves on the tools available in that space. And with the text, what we are doing now is we're, we're still very prompt heavy. So still guiding them towards the initial conversation. So where Dan's kind of mentioned, you interact with that opportunity, that button, that door, which I'm calling them, that door always still exists within the platform because it's a nice kind of like mental model going into that conversation and having that back and forth comes like chat and getting the cleaning the results from that. But it's also like, where do you sprinkle more doors throughout the platform and more entry points? Or do you expand that into more of a modal interface where you can like mobile first, engage with it where you want to. So we are trying to guide them for different prompts in the platform because I do think there is, there is a massive risk still with having empty text box of just having a conversation. We know probably more than most people in this space that you're asking for trouble giving a school student an empty text box on your platform. We've experienced that pain and we've built safeguards around it. But yeah, I think having that guidance to an extent is still good. So when you say you're really relying on prompts, you don't mean, let me make sure I understand, like in this example we were talking through of maybe a job listing. You were, you had started with, here's a summary of how it matches your profile and what you would need to do to be on this path. I'm assuming when you say prompts, that's like the beginning of a chat now as opposed to just a static. Okay. So they're not facing an empty box. You're basically telling them something about this opportunity and then they can interact and ask questions and dig in. Yeah, we're helping them start the conversation and then they can respond how they want to at the moment. Yeah. And is it, I think there's a lot of fun things you could explore with asking them questions rather than waiting for them to ask questions. I'm curious if like in this job example, you could just give a summary and let them ask questions, but you also could give a summary and then ask them a question. Tell me a little bit about what you're doing there and what that interaction is like. So in terms of like where we're like talking to them and like how we actually kind of conversed, like start the conversations, it's, so we've got, let's say that summary, and then what we will do is we have quite a lot of control over where, let's say we want to have a follow-up. We have explored like that kind of follow-up system message. Do we want to add more kind of guidance around that initial chat? One thing that we have like explored is that what happens if let's say someone has done something since receiving that initial interaction. Do we want the AI to then be like, I noticed that you updated your career profile, which essentially is like an in-platform CV builder. Essentially it's like you want it to feel like it is there with you, engaging with you and encouraging you in a little way. And I think one of the names that we did have with this originally was like kind of career coach, for example, because we want to be able to feel that back and forth interaction. I think we have that first starter conversation. We want to see better follow-ups, but the follow-ups have been a little bit challenging to be honest, because they will have an element of more randomness. I think LM is very good at having that initial response from a student and following the conversation through that way. What we've done is that we've built this, and I'm sure Dan can go into it a little bit more afterwards. We've built this from an admin layer on our side. We want to be able to control most parts of it, the most parts that we can possibly do. So in terms of if we wanted to switch the model tomorrow, we'd want to do that and be able to know exactly how that impacts the different outputs and things like that and the different system messages as well. So we have a different system message for our responses and how that kind of black people will engage, go back and forth with that. But when we try the follow-ups to the original conversations, the starter prompts, we found that it was getting a little bit confused in terms of the context. So it has been a bit of a tricky one in terms of like, how do you navigate conversations? When do you reach out to the student? Does it come up as like a nudge? I don't know. It's something that we have been looking at. Yeah, I want to get into all of this because I think I have some, I'm very curious about how you're managing context. But before we get into the technical bits, I want to fast forward to where you are today. You call it a career co-pilot, so I'm assuming it's evolved quite a bit than just this here's this job summary. Give me the high level of like, how is a student interacting with the career pilot today? And then we'll dive into the technical bits. So today, what we're finding is that it is tightening the orchestration across our different tools. So like where we do have that kind of original, I don't want to call it like a kind of summary from like an opportunity and allowing them to understand the opportunity a lot more. We're finding that people are doing the work they need to do before applying a lot better than they were before. Whereas before, let's say students would land on an opportunity page and spray the apply now button and just click it, send off that first application that they probably used by working with ChatGPT anyway, found like a lot of students were doing that unsurprisingly in the first instance. So what we wanted to do is it was quite risky, but at the same time, we wanted to make sure that we were doing it in a very controlled manner as a result, because we did want to add certain steps before they got to that one, like CTA, because that's an important one for us. We don't want to block them to do that because that would be a disaster in terms of like our partner's metrics. But we did see that people were going away from it and having those conversations and editing their career profile and adding certain skills that are relevant skills to those roles, asking more questions in the community, which is something that we also wanted to drive as a result of this. Asking relevant, having relevant conversations with their mentors as well. I think this, for me, is the one part of this whole thing that excites me the most is taking that knowledge from the co-pilot such as LLM and having a productive conversation with a human being. I think that's something really interesting, like having that in between, whereas before we wanted to go down the, oh, let's just let's create like an actual AI mentor on the platform, engaging with this like co-pilot to be more like, oh, thanks, I'll take this from you now, having short interactions, back and forth interactions and engaging with the platform as a result. Because another one of the risks was what would happen if they just started having long back and forth chats with this feature, just ignoring everything else we had to offer. So I've been really happy with how people have dipped in and out of it so far. So it sounds like it's still embedded throughout your site. It's not that they have this wide open chat interface. It's always in the context of they're doing something on your site and there's this pilot that's like helping them through the action they're working on. That's correct. OK, now let's get into the technical bits. Dan, tell me a little bit about what's happening under the hood. What does the co-pilot look like today? Yeah, so co-pilot at the moment has a wide variety of tools that it has access to. A lot of them are just data fetching tools. So they just go and call either our search kick or the database to get data to drive, to formulate that context. I think one of the big areas we decided to tackle first off was that context management. We didn't want to get to a place where when we did open that back and forth chat, we weren't in a place where that context was getting managed and the token counts would just get blown out of proportion or just compound as the chats got longer. So we spent a long time being able to manage our context. So, for example, removing tool calls if we think they are irrelevant now for the conversation. So if, for example, throughout the chat we had five tool calls to the career profile, you might not need to put all that into the next response. In terms of compacting the current conversation. Exactly, yeah. And we are not reinventing the wheel here. Loads of things that we are doing, you'll see a lot of the top tools doing. So, for example, summarizing historic messages so that it just comes compact and flattened. So you're not exploding the chosen counts then. Removing irrelevant tool calls. If we did, for example, yeah, as I said, another one being the job descriptions, like if they haven't changed or if it's got context of an old job description, like we don't really need that to be there. I want to give a little bit of context for listeners, right? Like when we chat with chat GPT, like in the web, I think we have some awareness of a context window, right? Every LLM has a fixed amount of space at which it can take in one conversation. I think what maybe some people don't recognize is when you have a chat interface, you're actually sending the whole history in every next request, right? So the LLM is still just getting a message and responding. But as the chat interface builder, you're responsible for maintaining that history and sending it to the LLM. And so what you're suggesting is on every turn, you can kind of mess with the history to compact it, to make it, to manage the size of the context. 100 percent. And the coherency of it, because sometimes like one of the first things we did was enable, because we're using OpenAI, so we were on the platform, we started a new project, we enabled the logs so that we could see specifically what was, we had that observability in-house. But for Elliot, for example, like he could see that all in the logs. So within the logs, you can see in this reply, this is all the context that was given to it. And then you're like, oh, like, why is that in there? Do I really need that in there? Do I need that much going into it just to respond to a quick and easy answer? Yeah. The other interesting thing maybe that people don't realize is that we, like picking the right model for the job is super important. Like we, for example, the structured output, that initial job analysis that we run, we actually run on GBC 5.0, I think it is. And then on our replies, because users expect a bit more of a quick back and forth, we're on a lower reasoning model for that. So, yeah, behind the scenes, that's basically what's going on. Again, with tool calls, it's not, there's nothing special going on. It is literally just application logic to be like, oh, this has not been updated. Don't fucking take it in. So, yeah, it's weirdly like one of the interesting things was that when we were building it, we were like, we're coming across these problems that we think are like so hard to solve, but they've been solved already and they're not really that complicated. I think the hard part is like just getting over the newness, like I've never done this before. OK, so how do I jump in? OK, I want to go back to what we were talking about with the context window stuff, because I think this is a skill that people really have to learn. And like in long conversations, you have to manage really well. So I want to double click a little bit on your removing tool calls. So I want to give an example of this. You mentioned a lot of your tools are searches. I can imagine with searching, you're using pretty standard search metrics to determine, did it find the right thing? With search, we always return more than we need. And so I'm assuming that's an example of like on the next turn, you don't have to include all the results that you didn't end up using. You're just including the result that ended up being relevant for the conversation. Yeah, exactly. I think a good example, again, I bring back to that career profile, the CV we have on platform. There's a lot of information on that CV. So we have a hash representation of that data. It's just nested JSON. And at the beginning, we actually just allowed the tool call to fetch that entire document and then just feed it straight into the LLM, which obviously, as conversations got longer, that bloats the total usage because these documents can be quite large depending on the user. So we actually break it down to have the tool call do a comparison of the previous data and the current data and then literally just give it the changes made rather than the entire document. So it doesn't need to go and look at two documents and think, oh, what the fuck's the difference? We just presented that so that when you look at it as a conversation, it's just more coherent. We're not applying any scientific reasoning to this at all or anything. It's just doesn't make sense. I love this example for a very specific reason. I've been writing a lot about cloud code and trying to get product people to use cloud code personally. And what's motivating this is I'm learning as I use AI myself in my day-to-day productivity, it's actually teaching me what I need to know to build good AI products. And I think what you just described is an example of this. If I'm in my day-to-day life using an MCP server and the tools aren't well designed, it blows up the context window and you start to learn, like, what is good tool design so that when you go to build your own product, you're aware, like, I shouldn't return this whole profile. It doesn't need the whole profile. It needs to know, did this field change? I'm going to return, just did this field change? Yeah. Yeah, I love that. It's it. I don't I can't think of another technology. I'll be curious if you guys can think of one. I can't think of another technology where, like, we can just play in our own personal productivity and build the skills we need to build, like, production. Products, maybe engineers, like engineers code as a hobby and then that helps them in their job, but like a product manager isn't coding as a hobby and then it makes them better at their job. But a product manager can use cloud and then learn how to build AI products. The only time I've ever felt the same way was going back to my office background and just like plugging into Zapier, for example, that was the early days of being like, well, I could do my job, but like on steroids here in terms of what the opportunities are. You know what, Zapier is a great example because it teaches you how APIs work. Yeah. Oh, that's oh, perfect. See, Elliot, you're smarter than I am. I love that. OK, Dan, I want to go back to just managing the context window. I actually a lot of people are starting to talk about we're going to just see our future jobs as context managers. And I think this is like one of the most important skills. We've talked about this example of tool calls and what tools return and removing the noise from the conversation moving forward. I can imagine a big part of your challenge, too, is how do we represent the user? You mentioned you're tracking all their behaviors on the product. Like, how are you representing that and knowing what's relevant for the current conversation? Tell me a little bit about content, context management for what you're pulling in for what's relevant. If they're not engaging with that part of the platform, we want the LLM to know that as well. So we don't just give information where it actually has the information. We will actually say no, like zero, or we won't give nils because we don't like giving nils to the LLMs, but we'll give it like some default text or something, as opposed to just giving like an incomplete document data. So we, I think we go back as far as three weeks, we try and collate three weeks worth. And again, like if the conversation has been going on for longer than a week, we will get rid of, hide that previous tool call and give the LLM the ability to call that tool on the next run. So it's, and that's just gated by application code. Like it's nothing that the LLM is doing. We are simply going, oh, we think you need this tool now. Here it is. Ah, so you're exposing whether the tool is available or not. Exactly. Okay. Yeah, yeah. So that was another technique that we've tried to use throughout and we think it works. Yeah. At scale, that would probably work to bring your general token usage down, especially on longer conversations, yeah. And then is your, it sounds like you have a primary orchestrator agent. Is it truly a loop and it's just deciding what tools to call or is it more, I know a lot of people are moving to pipelines where like maybe a turn is agentic, but like getting the loop to work and be useful has been really hard for people. No, we are just on a loop. So the orchestrator, it goes through a controller. The orchestrator is the same, but it's got that logic to basically decide. It's got our system logic to decide what tools it actually has available and what context it will have in that next run. But yeah, we don't do anything really on top of that. Trying to think whether we, no. Like safeguarding is all within the same thing. Yeah, I want to get into safeguarding a little bit. So tell me, is that a tool? What does that look like in your system? So we go hard on just moderating inputs and responses, but we also have an external partner that we have a contract with called Unity, Unitary. Sorry, is that right, Elliot? Yeah. I think they're a data labeling company, but they produced an API that actually blew up. Like it's a moderations endpoint, but they can do images, they can do videos. So we actually use it on other areas of our platform, for example, community. So we've decided to actually run all our message history into Unitary on a cycle, just so we're not relying on just the moderations endpoint itself, because some of the data that comes out is not, I would say maybe not as comprehensive as what we would like it to be. For example, Unitary really dive into the categories. So yeah, got this two prongs approach of making sure we're doing our due diligence in moderating the input and the output, but also having an external provider that we can run all our messages on. Okay, so let me make sure I understand. It sounds like for every turn, both for the user and for the agent, you're sending it through a moderator sort of filter step. Yeah, we are. And is that, are you using this third party for that, or are you doing that yourself? We pipe it through. So we rely on the moderations endpoint for that initial user experience, but then we rely on Unitary outside of that initial orchestration just to keep us safe and make us happy. Okay, I got you. Yeah, okay. I guess what I'm curious about is with chat interfaces, it's already hard enough to like keep the latency low. What does this moderation step add? Like how are you managing just the perception of latency maybe? Yeah, so weirdly the moderations really doesn't, moderations weirdly is really quick, especially if it's just text. We're finding that the tool calls seems to add the most latency. Structured outputs definitely do as well, but also just generally the model. But again, I think this comes back to it being maybe a UX problem, which has been solved already by showing the thinking. When these high reasoning models came out, you couldn't do anything about that. They needed to think for that long. When ChatGPC brought out the thinking steps and all that stuff, we've done the same thing. So we have it on a stream. And then if a tool call comes in, then we're like, oh, we're looking at this tool call. And then right at the end, it says, oh, it's generating the output. We've also done things where it's like the recommendations might load first at the beginning. So when it's generating that initial job analysis, which takes a bit longer because it's on a higher thinking model, there's already something for them to look at and interact with. So it's more just, yeah, trying to be clever with what you can do on a UX sense rather than anything you can do internally because you might not be able to do much internally. It's hard to mitigate that. This is fascinating to me because I think you're right to some degree, right? We have patterns. We're getting used to waiting. There's a lot of good UX patterns. I think this is partly why Cloud Code blew up so fast. It nailed this, right? We all look at the funny words as they cycle through them. But I also, I can imagine with 16-year-olds, they're not the most patient in terms of just sitting there waiting for a response. So I love that you're also like looking at what can we load quickly to engage them and then maybe like progressively load more as it returns. Yeah, we also made the decision to bring it up on an ever-present model. So it's not like they can't see the rest of the site whilst they're interacting with it, especially on like web views. So yeah, it does feel like it's not something that's blocking you from interacting with the site. It's just something that's aiding you along that journey and keeping you momentum on our platform. Gotcha. Yeah. Elliot, were you gonna add something? Just on the safeguarding front, really, in terms of what Dan mentioned, not impacting the latency, but we are always designing with that safeguarded first in mind. That's all, like we are originally a mentoring platform. And I think critically, I think AI can never make those safeguarding decisions right, but it can help us surface certain things. So there's a, we will have observability of every interaction. And we are like myself and a few others in the company are trained on safeguarding from day one. So it's like, how do we make sure that we really lean on our position as a tech platform, having that observability of all engagements in tech and make sure that we, if we roll out an LLM within platform, that how we have those smooth conversations with like schools and partners, which we are currently having, because I think a bit of our, one of our superpowers is saying, look, we are like safeguarding first. We build safeguarding first. And AI is definitely something that's adding a super contentious part to that. And it absolutely should be seen that way. But I think people, if you go in with the mindset thinking, look, how do we make sure that we, one way to put it into any decision making position and also have that observability, then it's quite a nice space to be in because it's an additional safety net for members on our side who are like, historically will have quite a few safeguarding flags and concerns. I'm happy to hear that you can send it to a moderation endpoint and it's not a huge latency concern. As I can imagine, that could be a whole product in and of itself. And so to be able to have services available to you, I imagine helps a ton. Tell me a little bit, Elliot, you said observability a few times there. We haven't really touched on evals. We talked about safeguarding. Tell me a little bit about what you're doing to evaluate quality. Yeah, sure. So we try and see as much as we possibly can. At the moment, I think it's a bit of a hybrid use of open AI tools available, which again, at the start, we're like, let's build our own thing. Let's build our own kind of version of this in our own admin layer. But again, the tools exist is, they are directly tied to the output from the model you're using. And at the moment, that is sufficient for us. I think the main thing is adding your own context, right? Is what failure taxonomy is important to you and your members in what you're trying to achieve. So let's say, for example, when we first started this, I saw it as a really important process to like educate wider employees on a lot of this stuff as well, because people just see like a bad output. Whereas all of us here will see a bad output as, oh, okay, let's walk back. Let's find out what the problem is here. What is it going to be next time? There's not like a bug we can fix, right? It's something that we've got to really pay attention to and tag. And I've done a lot of kind of exploring and upscaling in kind of the AI safety space that a really good course with, free course of blue dot impact, which I would recommend to anybody. And they walk you through the whole red teaming process and kind of the, we call it like green team process as well. And to get everyone upscaled and aware of these risks, we got everyone into a room and doing that kind of internal queuing. So I had students kind of work through this as well. And as I, we had one red team and one green team were kind of acting as like that, as a student would, and the red team were doing the total opposite. It was like, their job was to destroy this thing and prove that we should never release it. And a few of the things that like, a few of the taxonomy points we built around like how, which is very aptly named around hallucinations. We had some that kind of, we'd like tagging like stale recommendations as well. So like Dan says, something that they've potentially already done on platform or something that's not relevant. We'd look out for like vague responses. If an AI goes off on a tangent, something that really struggled to stop it from doing was telling a story. It would just consistently tell students stories. And it would, at the end, it'd be like, it gets the end of the story and just say, do you want to chat more about your application at this place? And it's just talks about unicorns and like Shrek or something. So yeah, it just goes off. It was really hard to contain it in that sense. But tone as well, I think is something we've found really challenging, but I've been really interested in is like, how do you stop it from being overly or underly encouraging to certain applications or roles? Because we had a case at very start where a student was applying for a job at a construction company. And the AI was like trying to find, it had not that much information on them in terms of their career profile or CV. It was trying to desperately find something to be like, this student's gonna be a perfect fit. So he's noticed that he had a skill in like Figma skill. And he's like, well, your Figma skill is perfect for like this construction business. And then it was very like manual kind of in-person construction. If I had a great reason for it, you know what, actually, that's a really good point. Maybe all construction workers should dive into Figma, but just stuff like that being very like trying to clutch at straws is something that we struggle with. But get back to your original question, that was like this categorizing all this stuff and trying to seeing all the data in front of us and from an eval perspective, I think that tagging process is something that we found has made us help just quite sane. Because again, it's something that you hear a lot about evals. I've listened to a lot of your work, Therese, on one of the evals and like kind of building that out from like scratch and how to do it on a new kind of view of like side things. But it's very overwhelming, but you just need to know what the things that you're looking out for and then be able to like tag in that sense. And then we use MetaBase and things like that to visualize trends and so we can really monitor outputs going forward. But yeah, it's overwhelming, but yeah, can contain it when you start to like piece together different parts of the puzzle. Yeah, so it sounds like you're looking, you have good observability, you're looking at your data, you've identified kind of your top failure categories. Do you have data sets that are helping you measure those categories? Do you have code-based or LMS judge-based evals at all? Like how do you get a handle on, like you mentioned, hallucinations, tone, like how are you getting a handle on how the LLM is performing on those dimensions? Just by spotting trends at the moment, to be honest. I think we're still quite early doors. Yeah, quite early doors in that kind of eval process. We are spotting trends and that is helping us fix it, to be honest, from what I'm seeing, but it's not, I'm very intrigued to know how businesses do this at like super scale because it's like, yeah, okay, I'm very pro like having human in the loop on this stuff, but I know businesses have used another LLM to analyze certain things. I find that really interesting, but obviously comes with- You're doing the most important step, which is to be looking at your data and to understand the failure modes. I think once you've identified a failure mode, there's three, let's see if I can get this right off the top of my head, three primary ways to start to turn that into a metric that you can measure, right? You can create data sets that you can use as your like almost QA, anytime you make a change, you're looking at, and the data set is curated to show, to expose that error. So you can see, is that error coming up again? But you also can look at, can we actually measure this with code? Is this something that we can deterministically evaluate? Or you can have another LLM judge the response and see, is there evidence of that error? And so that kind of allows you to do it at scale. But I think already what you're doing is like the, it's the, it's honestly the starting point a lot of teams skip. Like they jump right to an eval tool with a generic metric that's not fine-tuned to their case. But I think like Elliot, you were saying you were interested in learning like, how are people doing this at scale? I think they're just taking those failure categories and looking at how do we measure this? It could be a data set, it could be code, it could be an LLM as judge. And then they're using that as an ongoing measurement. Yeah, okay. Tell me what's next for Career Co-Pilot. What's next? I'll start that. I think the first thing is how do we, I think I've mentioned this before. How do we actually optimize on memory? How do we optimize on these long, lengthy journeys? I think one thing that we haven't really mentioned on this, in this conversation is, students can sign up as a school student and then stay with us until they're at university and applying for kind of jobs outside of that. And we're obviously very like GDPR first, we wanna make sure we don't retain any information that we don't need to retain. But it's like, how do we make sure that we can allow the LLM to retain that relevant information throughout that process that the member can then have control over in that interface as well? So that's something that I find really interesting is that you're talking about memory in a lot of products that is relatively short term, the grand scheme of things. How do you do it over six months? How do you do it over like a year, over a year? And do it in a way that the user doesn't feel like invaded and they feel like encouraged by that? Because it's like, yeah, remember that I did this and struggled at this and then I've got to this point. So I think that's one of the things we are looking at next. And I think, I guess with that, it's more like how, yeah, what can be forgotten as well? What can we just remove from the whole conversation? But memory has been really interesting and something we have already started to explore. And it comes alongside the scaling of evals as well, as you mentioned. This is one of the most interesting areas of AI products for me is what should the AI remember and what should it forget? And how do you represent it so that the LLM can use it well? And I feel like I can't think of a single example of who's really nailed it. I can see, I can think of lots of products that are experimenting in this space and it's fun to see the different, even the foundation labs like OpenAI is taking a very different approach than Anthropic. But this to me is like one of the edges that is most interesting to me of who's gonna figure this out and what are the patterns that are gonna emerge? I agree and I saw something recently, I can't remember who the product was, but one of their quirks was that they claim it as a quirk. I just reckon that they have to just remove the data from the database. But they were saying that our memory works like a human's does. Eventually the LLM forgets and part of data disappears. It's quite nice in a way, but also it's like probably just because you haven't figured out a way of like storing the older stuff. So it's like it fades and you have to engage with it again to bring that memory back. So yeah, it's an interesting one. There's also like a UX component of this of like, how is it not creepy? Yeah, we use cursor every day and they've got this memory layer in, but they actually allow the user to have control over that. So even though it will generate the memories, you still have the ability to go and edit them, delete them, which I think is quite a good use case for them specifically. But yeah, for us- There's a flip side as well, right? Cause it's like, what can you remove from memory? But what I find is particularly creepy with a lot of these models at the moment is that I'll remember that. I didn't tell you to remember that. Deciding what to remember is like one of the, yeah, one of the- Yeah, I personally much prefer Anthropic's approach where I control what's in memory 100%. OpenAI, I think it's a difference. Anthropic is going after business, going after knowledge workers and they probably want more control and OpenAI is going after consumers and they probably don't even want to know what memory is. And so I get the different strategies, but I'm a little bit, every time I see ChatGPT say I'm writing to memory, I'm always like, what did you write to memory? Let's discuss this. What did you remember? Yeah. This has been amazing. Is there anything else that you were hoping to share that we didn't cover? Not really. I think the only other thing we were looking for next was that we're looking at how do we bring more action execution into this as well. I think we want to make sure that we explore the Socratic reasoning space a little bit more from coaching students and also performing actions, bringing, I know, bringing like an MCP into this as well, but not doing it in a way that is going to invade anyone's privacy and is going to be in control of the memory as well. So that's why I think a lot of other products have gone wrong with the junked MCP and it's just been like, wait, how did you, why are you asking for this, et cetera? So that's something that we are exploring at the moment and we're quite excited about. Yeah, I love that. I've been thinking about this a lot because I am starting to build out like very specific AI teaching tools, but my grand vision is to build a discovery coach that then uses those tools based on where the conversation takes you. And I think this question of using an LLM to teach is like, what are the right questions to ask? How do you encourage reflection? How do you put more on the student to like engage and participate rather than I'm just going to give you the answer. Yeah, exactly. This has been really fun. I appreciate you taking the time and I know especially we're at the end of your workday. So I especially appreciate you taking the time. Thank you both. Thank you for having us. Thank you very much, Teresa. Big pleasure. If you enjoyed this conversation, please subscribe in your favorite podcast app and give us a rating as it helps others find the show. Thanks, I appreciate it.