Building Agent Studio: How Medable Is Using Agentic AI to Accelerate Clinical Trials

Overview

This episode explores how Medable is applying agentic AI to one of the most complex and heavily regulated industries: clinical trials. The team explains how their internal platform, Agent Studio, powers both Medable-built applications and customer-configured agents to reduce administrative burden, improve data quality, and ultimately accelerate the delivery of therapies to patients.

A central theme is that AI is not being used as a novelty layer, but as infrastructure for solving deeply manual, high-cognitive-load workflows in clinical operations. Medable’s long-term ambition is “full self-driving” clinical trials: agent-powered systems that help humans manage far more trials with greater speed and accuracy.

Key Takeaways

Medable’s core insight is that the biggest bottleneck in drug development is often not science, but clinical operations. Trials generate enormous volumes of documentation and fragmented data across many systems, creating slow, error-prone human workflows. The team highlighted that a single study can produce tens of thousands of documents per month, while clinical research associates may need to work across 13 or more systems just to understand what is happening in a trial.

Rather than build isolated AI features, Medable chose a platform approach. Agent Studio allows teams to configure agents with different models, knowledge sources, workflows, triggers, and connectors, then reuse those capabilities across many use cases. This reflects a broader product philosophy the company already used in its SaaS business: build shared infrastructure so each new solution becomes faster to deliver.

Two flagship applications illustrate the value. The eTMF agent addresses document classification and metadata assignment for trial master files, a task that can take several minutes per document and requires understanding hundreds of classifications. The CRA agent helps clinical research associates synthesize data from many systems and recommends next actions, moving beyond legacy tools that only surface signals without guidance.

A particularly valuable point was their treatment of reliability. The team stressed that AI systems should not be compared to an idealized deterministic system, but to human performance in the same workflow. Their goal is not perfection, but lower variance and fewer errors than people make today. In regulated environments, this requires strong evaluation practices, traceability from intent to design to evidence, and thoughtful use of human review—while recognizing that humans are not always the ground truth.

Practical Steps

Start with a painful, high-volume workflow where humans are doing repetitive cognitive work. Medable focused first on document classification and multi-system monitoring because the value was obvious.
Build AI capabilities as reusable platform components where possible. Shared connectors, knowledge layers, evaluation tools, and deployment patterns make future use cases easier to launch.
Keep agents narrowly scoped. Instead of one agent connected to everything, design specialized agents or sub-agents for specific jobs, then orchestrate them.
Invest early in evaluation infrastructure. Test different models, prompts, and retrieval strategies against outcome-based benchmarks rather than relying on intuition.
Treat retrieval and data access as product design problems, not just engineering tasks. The right method depends on the structure and source of the data.
Use human-in-the-loop carefully. Human review can build trust and catch issues, but teams must also define when human corrections are actually valid.
In regulated contexts, document intent, design, and evidence from the start so AI features can fit into compliance processes rather than sit outside them.

Notable Quotes

Jen: “Our ambitious big hairy goal is one year.”
Luke: “We start with, does the platform capability exist for these solutions so that the next solution that comes around will have that capability already baked in.”
Jen: “We shouldn’t be just comparing these agents to these systems. We should be comparing them to humans.”

Full Transcript

Source: openai 1h 06m runtime

Welcome to Just Now Possible with Teresa Torres. Hello, I'm Luke Bates. I'm the product leader responsible for agent studio at Medible. I spend a lot of time working with product teams, the rest of the company, our customers, working to understand their problems, to address them with a platform way to deliver agents. Hi, my name's Jen. I'm also in the product management space here at Medible. I've been with Medible for about five years now, so I've gone through lots of variations of product roles here at Medible. Most recently, I am overseeing our product teams, where we're looking to solve customer problems with our agentic-powered solutions. I'm Matt Schofield. I'm a product designer for Medible. I work on both the site experience and patient experience teams, in addition to our agentic platform and our design system. Lots of hats in that role. I've been with Medible for a little over three years now. My name is Fikar Matthews. I am principal architect here at Medible, and I'm on the engineering side of agent studio. So I would have initially started with helping build out the initial core infrastructure, agent runtimes, tools and MCPs, and evaluation layers. Right now, I'm focusing specifically on helping build out the ETMF agent work. So it's, yeah, mainly the engineering side. Before we get into your agent platform, which I'm really excited to learn about because I've heard great things about it, let's just start with what does Medible do overall for folks who haven't heard of it? I would say that overall Medible's mission and vision is to bring effective therapies to patients faster. So we've had a journey throughout our over 10 years of life where we're really looking to solve the problem that is, it takes over 10 years currently to get a drug to market. So where our bread and butter has been the last few years has been in the clinical assessments and e-consent space where patients are able to complete electronic questionnaires and complete an electronic consent. And this is really looking at solving the overall problem of today, a lot of patients don't live near a clinical site and therefore they might not be able to go into a site to be able to complete a clinical trial to understand if this could help them cure their disease essentially. So ultimately all the things that we try to attack are trying to enable patients to get effective therapies faster. And as we've evolved into kind of the agentic AI space we're taking a whole new kind of lens at that problem space because of what the technology unlocks in order to help solve that problem even more effectively. Yeah, amazing. Okay, I'm a little bit familiar with just how many rounds and how long it takes and the money involved to run a clinical trial to get a drug approval. I know about this from the U.S. standpoint. Are your customers, is this primarily a U.S. challenge or is it global? It is a global challenge. Okay. Yeah. We are the clinical sites that we support over a hundred different languages and we support clinical sites all over the world. Oh, wow. Okay, excellent. Yeah, we tend to have a global audience so I always like to set the scope of what we're talking about. Okay, so now I understand you have an agent platform that's then driving a lot of other agentic products. Does one of you want to give me the overview of that product and then we can dig into how it came about? Sure. We have the platform that we have allows users, common users, not just engineering users, to be able to configure their own agents using agnostic models so you can bring your own model, you can use any model that's a flagship model that's out there. It pairs with RAG knowledge. We have a breadth of MCP connectors to be able to connect with different data systems. We support workflow functionality within Agent Studio and agent skills, which is a common pattern that Anthropic has helped to develop to be more precise and manage context windows for agents. We also support multiple triggers, so a lot of people are familiar with chatting with an agent, but that's not the only way you can interact with an agent. We focus on how can you interact with an agent through Microsoft Teams or through Slack, or how can it be triggered based off of a webhook or another system, and you not have the human trigger from the beginning. We have a couple different deployment patterns for how we deploy Agent Studio. We're talking a little bit about ETMF today, but ETMF is an application that we built on top of the agent platform, and the agent platform is the engine for it. We also have some of our more traditional products that was operated in the past. We augment those products with agents as well, and we also allow our customers to build agents directly onto our platform, so it's both a platform and an experience for our users, and in that case, we have a services arm that helps to provide a forward-deployed engineer to help build out agents to solve customer problems, so it's pretty broad, and we take a platform approach to every solution that we provide. We start with, does the platform capability exist for these solutions so that the next solution that comes around will have that capability already baked in and easy for somebody to be able to configure? Yeah, okay, so I heard two products that I want to dig into more, so one is this Agent Studio platform that is allowing your internal teams to build agentic products, but it's also enabling your customers to build their own agentic products, and then ETMF, what is that acronym? Electronic Trial Master File, and there's actually, we do have a third agentic-powered application, I would call it, which is, we call it a CRA agent, but CRA stands for Clinical Research Associates, and that's really focused on a specific persona that we're developing an agentic application for, so do you want me to go into each of those and the problems that they're solving? Yeah, and you said there were three, so I heard two. Oh, I added to the agent platform. Okay, gotcha. Yeah. Yeah, give me a quick overview of the two, so you have two products today that build on the agent platform? Yeah, exactly, so you could build lots of agents right on the agent platform and just out of the box, leveraging the forward deploy engineers or even if customers want to build their own, but the ones that we've been focused on first is what we're calling agentic-powered applications, which actually have multiple agents powering them, but essentially, the ETMF, the Electronic Trial Master File, is the system that has been around in the clinical trial business for many years now, and essentially what it does is allow you to store all the relevant documents for your clinical trial so that when the FDA or someone comes in to review that clinical trial, they're able to easily find all the relevant information associated to it in order to determine was this trial conducted properly? So essentially, what we've discovered in our problem discovery is that what happens today is there's over 80,000 documents a year uploaded to this system just for a given customer, and essentially it takes the users that have to upload these documents into this system around at least five minutes per document to assign the associated classification and metadata that allows the auditors or the people that need to review the documents at the end of it to more easily find them and more easily do their analysis. So this was an obvious problem we felt for AI to be able to solve, right? There's over 350 classifications that the user has to understand and assign associated metadata to. So leveraging AI to solve this problem just made sense where we're able to, we would be able to completely take the human out of that process of the classification and the metadata. Obviously, we look to start with human in the loop to always to ensure that we're feeling the trust and the accuracy of the agent results, but that's an overview of kind of what the first problem area that we're looking at with this agent-powered application. Okay, great. And then do you want to give the same overview for the second product? Yeah. A clinical research associate is someone, and there's lots of them, probably sometimes over 1,300 clinical research associates at a given sponsor or a CRO, and the people that are actually funding the trial and essentially what their job is to monitor the data coming through from the clinical research site and from the participants throughout the lifecycle of the trial. To do this today, there's about 13 or more different systems where data is coming in throughout the life of the clinical trial. And so they need to navigate through all 13 of these different systems to be able to just understand what's going on in the trial. Is the data of high quality? Is there any patient safety risks? And it's very time-consuming for them. It's just to get to the point of understanding the data because of the problem of having to sort through these 13 different systems. So what we've done is leveraging the agent platform is to be able to connect those 13 different data sources together, surface the data to them in a more easily understandable way. And then one of the key things is provide recommended actions. And this again is where AI is suitable and it's doing a better job than some of the other legacy systems. And the legacy systems, they can surface like a potential signal of something, but there's no recommended action on how to actually move forward. And the AI can actually go one step further and take the action for you on behalf of the human. Again, with that human in the loop. Okay. And then just to give listeners some context, do one of you want to give me like a high level structure outline of how a clinical trial works? I want to just give people a sense of like the scale of time, the scale of people, because Jen, you just described two very clearly clear problems, but I want to also make sure people have the context to understand them. Yeah. What's interesting is I think a lot of people think like the development of the drugs is the long tail, but it's actually not. It's the actual clinical operations. And so that's where we really focus here at Medieval is trying to reduce the time for that clinical operation. So that 10 years could be potentially reduced down. Our goal, our ambitious big hairy goal is one year. And so essentially like it starts from creation of a protocol, which is the rules of the clinical trial and actually ensuring that the science team comes together and understands what they need to assess to determine if this drug is effective. And then it goes to sourcing a bunch of sites, political sites to find participants to determine who can participate in the trial and they're recruited and there's an enrollment process to determine if they qualify to participate based on the eligibility criteria. And then from there, it's a matter of depending on the protocol like procedures. So if it's a vaccine, you get the vaccine and then you monitor the symptoms of that vaccine for typically around seven days. And then you monitor any ongoing symptoms from there. And all of this, because it is global and it's taking place across the world, needs to come together into one kind of streamlined data source. So one of the major problems today is that each of the different sites have a different kind of source system. Oftentimes it's paper still. And then they have people that they hire at the sites that actually look at the paper and re-enter that system into what is called Electronic Data Capture, EDC today. And then you have the CRAs, for example, reviewing the data. They'll go on site and they'll look at the source and they'll look at the EDC and they'll make sure that they're the same. So there's just so many manual things that still occur today in this industry that technology in general can help solve. But we're really seeing a lot of cool new opportunities with our agent platform as well. Yeah, this helps. So like when we talk about your ETMF product, you're talking about tens of thousands of documents being entered over 10 years and making sure that all those documents end up with the right metadata. And when you're talking about CRA product, it's really helping with across a 10-year period, making sure that data coming in, getting to the right place and being seen by the right person. Yeah. And you can imagine if you find an issue a month after it's been entered, no one remembers the reason that issue occurred in the first place. So proactive data monitoring, be able to find issues when they happen and not one month later is super valuable. So you mean like symptom and then they need to follow up with more information. You can't do that a month later. Yeah. I also think it's interesting to note the actual scale of the documents. There are single studies that produce tens of thousands of documents per month. Oh, wow. It's not tens of thousands of documents over 10 years. It's tens of thousands of documents a month per study. It's a lot of documentation. Wow. Okay. I'm hearing clear customer problems. Like I know that clinical trials are extremely expensive and a lot of it is because of this administrative cost and just the volume of data and having to get it right. One thing I'm really curious about is how did you decide to start with an agent platform? First of all, is that where you started? Did you start building one of these agents and then realize there'd be more? Tell me a little bit about just the beginning of this platform. How did it come about? How did you identify a platform was the right approach? So I think we saw that there was this human like capacity limit to the problems here. And we've been trying to tackle this for a number of years prior to adopting agents in ANI ourselves and our own operations. And it seemed like a complete natural fit to be able to look at this and be able to say, look at this high cognitive load that we're putting on these individuals. They have to learn how to, they have to learn about a protocol. This protocol could be over 200 pages of material that's scientific evidence and safety and guidelines that need to tell them how to do the job. And they have to be experts at knowing that document. And there's hundreds of these people that are supporting this across sites globally around the world. And we look at a problem like that, it's like that type of problem is really well suited to be able to, they're like rag knowledge for a protocol and be able to provide easy answers versus having to depend on a CRA who helps to be the arbiter of asking questions and getting information from a handful of scientists that are supporting the clinical trial. So that's like number one. Number two is really this messy data problem there's data in multiple systems and you need to move that data around into very structured formats. And that part is painful. And with the advancement of AI, it does that kind of stuff really well. And people do not. I would say there's a huge accuracy problem that leads to all this burden that agents help to address with that. So it was a natural fit for us to be able to say AI is the right problem to solve. But we've been a SaaS platform company from the very beginning. We've always solved our solutions from a platform approach to be able to look at it from the perspective of we build isolated environments for our customers. We are the data processor for their environments. And we wanted to follow that same practice with the way that we build agents. We want to build agents that are purpose-built for our customers. They're not going to be all out of the box. They're all going to have their nuances. Every trial is different. Every customer is different. It allows them to be able to manage that isolated concept. And as we build on the agent platform, we're able to accelerate our ability to deliver more solutions much faster, which is a key part of that as it feeds into our product development lifecycle. Yeah, I love that you're building a platform that both you're using internally and you're exposing to your customers to use for their own needs as well. I think that's just a win-win. I also love this mindset of, okay, all of our customers have really bespoke needs and I could see how in clinical trials that's definitely the case. And so you already had developed this platform mindset and when AI came along, you were like, okay, let's take the same platform approach and what's that platform component that enables these types of solutions? I'm curious about, as a team, your background. Did you, do any of you have an ML background? Were you familiar with, like, how did you get up to speed with let's build an agent platform? I don't have a machine learning background, but I did, essentially my background, I started in academia around biomedical engineering and then moved into, from that, after my post-doc. And this kind of surge and this different approach to AI, and I got fascinated by it. So it was just, it was, you couldn't keep me away from reading and learning and trying to do things with it and experimenting with it. And especially at Metable, the environment for that kind of experimentation was incredibly encouraged, right? Because we wanted to see what we could do and we wanted to understand what were the limits and levels of what could we provide to our customers with this. And it required a lot of experimentation. And always with that eye, I think as Luke said, we are a platform company and we wanted to provide solutions that would allow our customers to solve the problems that they had. We would provide solutions to them, but they also could provide solutions for themselves through our systems. And so it was, that was how, that was what brought me here. And that, that's my, that was my experience. Speaker was a very early adopter in us advancing our AI and agentic capabilities because of his researcher mindset. So I'm just pointing that out. I did not myself come from machine learning. A lot of it is self-learned, like, I think everyone in the room here, we're all, you can't even read books about the stuff that's being advanced right now. Like, you've got to stand on top of the latest papers and the latest YouTube videos to see what's been released yesterday to learn how quickly you can adopt it and learn about it. It's a different, it's a different way to learn about something than the traditional SaaS platform technologies, which is where my background is, primarily SaaS technologies. In my early career, early part of my career with GE Digital, they hired or they acquired a company called Wise.io, who is actually very big experts in the machine learning space. They hired a lot of really smart data scientists. And essentially what we were doing was mining like more of like industry of things. So we were mining through gas turbine data, windmills, and we were making recommendations more to GE internal users to be able to say, hey, it looks like there's a problem with this piece of equipment and this specific aspect of the machine and providing a recommendation to the user as to how they should act on that. And then having that sort of continuous learning is feedback mechanism to continuously improve. But it definitely feels like a different world here with the agentic platform suite. Yeah, I think what's pretty fun about this moment in time is there's just so many teams that are in similar situations. They see the potential of this new technology and everybody's just diving in and trying to figure it out. And it's actually why this podcast exists. It's just how do we help level everybody up by sharing stories and what's working and finding patterns. And it's a lot of fun. Matt, is there anything you wanted to add? Sure. I look at it as I'm learning on the fly as well. And as a designer, I'm always looking for different tool sets to tell the story and things like agentic and AI and vibe coding and things like that has really taken center point into our process. And it's allowed me to learn pretty fast and our team is pretty small and nimble. So that's a good thing to have in our toolbox and yeah, just continuous learning on it. Excellent. Okay. And then give me a sense for when did you start working on the agent studio? I want to say it was about two years ago. Is that right, Fikra? Yeah, to different degrees. So we've been, we have, we've been, yeah, about two years ago, we would have started. I'd say the agent studio software we have today, I'd say the specific thing is probably about a year when we really started building it. So yeah, I'd say depending on, yeah, about two years. Okay. The reason why I asked this question for listeners, we are recording in February of 2026 and I feel like, Luke, when you first described the platform, it sounds like almost any agent harness that exists today, right? So you've got skills and MCP connectors and like today these are everyday things that everybody's playing with and there's a million SDKs and it sounds like when you started though, almost none of that existed. So this was very much, we're going to build this ourselves because this enables all this, the rest of this functionality we can see down the road. So I want to dig in a little bit because you have a platform that you're then building a product on top of. Maybe we can look at this, look at these in parallel. So if we start to look at how does the agent platform work and then maybe what does it enable in one of those two products? So we can get understanding of both the platform and then also what it's unlocking in one of your two products. How does that sound? The way it works, yeah, that's a good question. When you, so when we have an app like eTMF, I'm going to work backwards from one of our apps from eTMF for syrah and that'll help to make more sense. Those are not just single agents. Those are an ecosystem of agents or orchestrating agents. And there's also front-end experience and there's databases that are not necessarily agentic that are also plugged into those are more agentic powered applications. But if you work back from it, the smaller components that help to make it up are built with very specific jobs. So you might have an agent that just focuses on, I'm going to be really good at classifying this document for eTMF. And so the way that you would build that agent is you would make sure that you have the, we've invested the right knowledge that helps to share what are the document repositories that are representative of what we'd want to be able to classify with. You'd want to be able to create that agent to add that knowledge to it to say that this is the knowledge that's relevant to this agent. We would be able to configure the model parameters, like which model do we want to use, and define a system prompt. And we do everything in a versioned way. So like every time you create a new version of an agent, you have the ability to work through a draft process, go through the process of building out evaluations and then being able to publish that and share that out with the end users. For an agent that's supporting one of these applications, the end user is really that app and it's not like a user going and working on the specific agent, but you're able to use it in many different ways. You're able to directly interact with that agent or sub agent, or you can work with it directly through the application. Okay, so I'm a heavy cloud code user. And as you just were describing that, I could imagine like I've got my Agent MD file, I'm picking my model, I'm giving it instructions. But I know that might work for you internally, building agents for your other products. I suspect that doesn't work for your customers who don't really tinker with things like cloud code. So maybe this is a question for Matt. Are your customers actually creating agents and then building their own solutions that involve multi-agent systems? And if so, how are you making that easy for them to use? Yeah, it's a pretty hard task to accomplish. I wouldn't say that all of our customers are building their own agents. I think if you use the example of the CRA agent, through our research, we found that for the most part, we want the agent portion of that to power the, what they see on the interface, right? Having the ability to adjust things like when a trigger happens or how often that you get notifications, for example. It's more, I guess on the, it's like human readable, for example. It's not, you're not getting in too much to the technical implementation of it. So it was a challenge to kind of balance that with the output of what they actually see on the UI. I think what you're pointing to is there's a different way that people work now when it comes to working with AI and agents and cloud code and cursor that requires a different type of mindset. And not all of our customers and our users have that mindset. So that's why we have two deployment mechanisms when we provide direct actions for Agent Studio to our customers is we can say, hey, we are there to be your Sherpa. We're there to help guide you through the process to help to adopt this culture change and to better, like to build out these agents that support your business need. But we also allow you to be able to do this as you've jumped on this journey. We had a recent example where we gave an onboarding to one of our customers, a user who's maybe more agent aware, more AI aware. He's probably spent some time using Inate and or using Cursor. And I don't think he's done cloud code, but he had some context, but he wasn't deeply clinical. He was mostly on the business side. We gave him an onboarding of about 40 minutes and he stepped away for a month and came back and he built his own ecosystem of agents. He envisioned it in a different way than we had shown him. He went and said, I actually want to bring this value into Microsoft Teams because that's where my users are. And I want to find a way to have them interact with Teams and work with this ecosystem of agents that use the connectors and data that we provided for him within that instance. He was able to do this within a 70-minute session. Like he built his own agentic system. He didn't find any bugs and there's no bugs, but he did it all just off of guidance that he had heard from three weeks ago, four weeks ago. And he did that through the user experience that I think was big. It speaks a lot about how easy it is to use once you have that right agentic mindset. I just had this analogy in my head. It's almost like you're building Lego building blocks. In some ways, you're shipping products for them where you've said, here's your Lego In order to enable AI to be able to help that human understand how to, or to be able to join the data across these and make it effective for the human, you need that common layer. Yeah, okay, this is great. And I really want to dig into this because I've had so many episodes where a number of the things you're touching on have come up. Like these data layers of really just how do we structure data to make it useful for the LLM challenges around retrieval. I think what I like about what you're building is you're building this agent studio platform. So it has to work in a lot of different environments. It's not, you're building a retrieval step for this very specific purpose. It's, you're trying to build retrieval enablement across a wide variety of use cases. And so I'm curious about how do you decide when, if you're building an agent in your system, do you have all these building blocks of you could choose embeddings, you could choose markdown files, you could choose this system, and those are like building blocks that are available. And then if you're Fika, I heard you say you're giving that to your customers, like they can create a RAG step. And what I'm curious about there is, are you telling them to create an embeddings step? Do they choose that? Do they have the knowledge to choose embeddings over markdown? Like this feels very complex. This feels like a very engineering decision. And so I'm curious about the platform approach of like how you're enabling this across all use cases. And then I'm also curious about what are you exposing to customers and how are you helping them make good decisions around this? So we have a couple of things that can help with that process, but you're pinpointing one of the problems that we're actively working to, to solve. People will look at a new agent and they'll think, okay, I need to, need this agent to do this thing. And am I going to use the system prompt to guide everything that agent's going to do? Am I going to build a skill and attach that skill? Does that mean that I'm not going to put this part in the system prompts? And am I going to put some of this in a knowledge base? These are all questions that will come up for our end users. And I think the thing that we want to get to is we want to get to a place where the users don't have to make that decision all the time. It's like, you describe the problem that you're looking to be able to solve and we help to come up and propose the right recommendation for the configuration of an agent to the point of actually building that agent for you with our agents. It's very, it's like agency eating agents. So I think that's where we want to get, but while we're getting there, we're trying to solve, we're trying to make sure we're grasping these customer problems first and we have all of these core components available. So I would say right now, a lot of these solutions that we're building for customers are probably more of us involved in trying to make sure that these solutions are working and less of, and providing feedback to their experiments on the system. But I think it needs to evolve from there for sure. I'm almost imagining you might need like rules. If your data is structured this way, if it lives in this form, then we're going to, you're going to, we're going to ask you some questions about your data. And then we're going to recommend, oh, that's good for embeddings or oh, that's good for keyword search or oh, that's good for unstructured, maybe markdown. Because it seems like every team I talk to, the team themselves for their specific use case, they go through these cycles. Like everybody starts with embeddings because it's like the sexy RAG step. And then they realize like that's not really the right solution for them and they jump to the next thing. And it's almost as an industry, we're discovering these rules. Like depending on the nature of the data, this is the right way to retrieve it. But it seems like for your platform in the long run, you almost have to codify these rules so your customers don't have to think about it. I think you're probably going to go here too. But if we're focusing on the outcomes of what these agents should be doing, we're trying to evaluate whether these agents are actually accomplishing those needs with the configurations that we've provided. And we have something built into the platform that allows us to be able to evaluate various configurations. So you could build a set of evaluations and be able to benchmark based on different configurations. So maybe I want to use this one with this model from GPT and this other model from Claude. Or maybe you want to change the system prompt altogether. Maybe one you want to be able to use a knowledge embedding versus some other mechanism. So I think we're creating all these tools that allow you to experiment and land on the right outcome. And I think you're pinpointing correctly that we need to have something that helps to close that gap and make it a lot easier for the users. Yeah, I think by building a studio, like an agent studio while also building your own products, it helps create this like awareness of, I feel like so many teams are focused on their specific use case they're building for. But because you're trying to build the platform, it's almost like you're getting this step ahead on what are the generalized rules, which is pretty cool. All right, I could nerd out about this forever. I want to talk a little bit about how you're using MCP. Is this primarily to connect to third-party services? Are you creating MCP servers for your customers? Tell me a little bit about what you're doing there. We're building most of our own MCPs and we're using a couple of third-party ones, but we're building most of them ourselves. We've built a sort of a layer around it so that it manages our own auth systems. Essentially, our authentication mechanisms internally is wrapped around all the MCPs. So all our MCPs when they're invoked pass through our auth mechanisms, and then those services will allow us to retrieve credentials that our users can add. Again, if they want to access, essentially, if they want to access some system that they have, a third-party system, then they will add their credentials to our systems. And so they will only ever get the access to their systems that they already have themselves. We don't do like super user access to external systems. So when somebody is using one of our agents, they are accessing third-party systems as themselves from a credentials perspective. So that's the stage we've taken the MCP system and we've layered our own auth on top of it to make sure that all our authentication and everything that we're passing credentials around is all managed by ourselves. So essentially, MCPs have turned to be the very, for us at least, a very convenient protocol to allow us to call these tools. And we just built on top of it to help manage auth and make sure it was all very secure. So it sounds like you're using MCP for internal retrieval. So like, I know, Jen, you mentioned 13 sources. Are some of those internal sources and you're building MCP servers for each of those? Some of them might be third-party outside-the-building sources, and you're using third MCP servers for those? Is that an accurate picture? Yeah, exactly. So like we have what we call an Incoa platform, which I mentioned is like collecting patient questionnaires throughout the trial as well as the consent. So those ones, right, is internal data. But we also have systems that are external, like the electronic data capture or the TMF that we've talked about as well, where we're leveraging like the external systems and the available documentation that they have to create the relevant MCP. Okay, so one challenge I see come up a lot, actually, these might be two different challenges. I see come up a lot with MCP is 13 external sources. So that's a lot of MCP servers. I'm assuming they have a lot of tools. So already in the back of my head, I have a concern about context window bloat. And then the second challenge that comes up a lot with MCP servers is just the design and are they designed to be token efficient? And what are you doing to ensure that? And so any just things you've learned as you've designed MCP servers, how you're managing that context window that other people could learn from? One of the things that we've learned is when you're building MCPs for some of these external tools, you're at the mercy of the other vendors' API mechanisms and their wrappers. So if they don't have a strong and robust mechanism to be able to query their data, we're going to have challenges with that too. And the agent will have challenges with it. And if you just built an MCP from scratch and say, Hey, go in and do stuff with that data in that other system. And we don't understand like the structure of that data in the other system. You're going to spend a lot of that context window going back and forth trying to figure out the right query to get access to the data. And so we've had a couple of different patterns that we've explored. The data ontology will definitely be a big part of this. But right now we've used, for instance, agent skills to be able to help prime MCPs when we set them up for a new customer. And for context, we set up an MCP server and then we create instances of those servers that can be deployed per customer. So once you create one for a system, any customer can configure their own with their own security client ID and secret and that type of thing and their own credential mechanisms. But then we would also augment that with a skill that says, we built this skill based on knowledge of that system that whenever you use this specific connector, apply this skill, use this query structure that helps to navigate the system in a much more effective way. And when it comes to using 13 systems, I would advise against building an agent with 13 systems all in that one agent. I would probably do use a sub And take a look at kind of the rules that are applying, so based on whichever one was correct. Well, so let's say the human was correct. And from there, we would take a look at the not, like all of the conducts that the agent is given, and look to how we can improve so that next time around, the agent can get that verification more accurate. And you're able to like look at it and see that the human was correct because there's like clear rules to follow? That's where the process of working with the customer comes in right now. I think if we look about this, like from a platform approach, right, one thing we can look at is some sort of experience that is a review experience that allows that AI admin role or kind of business expert role to be able to actually do that and then provide that automatic feedback into the agent once we know which one is truly accurate. Yeah. Okay. This is, it's so easy to default to the human in the loop is the ground truth, but I think there are lots of cases where the human in the loop shouldn't be the ground truth. And I don't know that this has come up before on our podcast, and so I find it fascinating because I am actually wrestling with this myself with, I'm working on extracting opportunities from interview transcripts, and sometimes the interviewer can correct the opportunity, but I have a lot of people I work with that don't really know what an opportunity is, so I can't take their correction as truth. And so it's very messy. It's very hard to know, okay, what do we do with this? And I'm basically doing what you described, Jen. I have to do a lot of manual review to figure out, like, what are the implied rules here? How do I decide, is the agent better? Is the human better? Then there's a UX challenge of how do you communicate to the human that they might be wrong? Yeah. Okay, Luke, I want to go back to something you said. You used an acronym I'm not familiar with. It was something about being compliant with something that maybe is in the medical, healthcare, maybe space? We have that problem a lot. We have a thousand acronyms for everything, and we just assume that everybody knows what they all mean. You can call us on them at any point in time. Good clinical practice. GXP just refers to the wealth of practices that exist within good practice. So it could be like good manufacturing practice, good clinical practice. And this is a set of guidelines that regulatory bodies adhere to when they assess and review systems that are used in clinical trial contexts. And also in this case, it will be used for agents that will be used in clinical context. And that's probably a big differentiator for our agent platform and how we're delivering specifically to the clinical trial industry is we're focused on the regulatory needs of this specific industry. Yeah. I wanted to ask you about this because I suspect a lot of teams out there are taking on some of these challenges because of the regulatory environment and how scary and hard it feels. And so I'm happy to hear that that hasn't kept you from doing this. And I imagine a lot of it comes back to where we started, which you have big, hairy, real customer problems to go after that this technology seems really well suited for. But is this something, like when you went into this, when you're like, we're going to build agents, were you aware of how you were going to work through these regulatory challenges? Was this just a big unknown that you had to unravel the hairball? Tell me a little bit about that. We've been battling the regulatory challenges for our entire existence. Like we've been refining, like we had very challenging process that kept us flow in the way that we delivered products. Sometimes it would take us a whole quarter to be able to get new features and capabilities out there because we have all of these documentation requirements to be able to meet those standards. We've spent a lot of time refining that process to make it a lot easier for ourselves so that we can move a lot quicker. And with the agent platform, we're finding opportunities to accelerate that even more to be able to deliver products within this environment. Okay. And then you mentioned, like this came up in the context of evals at the agent studio level. So give me a sense of what are you doing at the platform level to make sure you stay within these regulatory rules? So there's a couple things. So what systems look for, what regulatory bodies look for systems is that we follow a very specific practice of understanding the intent of a specific software product, understanding how that traces to a design specification and how that design specification traces to actual test evidence that it worked the way that you'd expect. So that common pattern exists for GXP systems and every feature and capability that we deliver on the platform follows that pattern so you can trace the way that we've built all of these components. It has the right foundation. And that allows us to deliver our documentation aligned, not different from the way that we're delivering the rest of our software products. What's nuanced is when you build agents on top of this platform, the platform itself would already be validated, but when you build agents on top of it, we need to be able to also prove those things, that we're building these agents to a specific intent, that it has a design specification, and that you can prove that it's doing the thing that you'd expect. And evals become really important in that part of it's doing the thing that you'd expect it to do. But is it a key part of the challenge that helps to address that problem? Yeah, I got to say kudos to you. I can imagine this is the kind of thing that a lot of companies would look at that and go, this is non-deterministic. We can't prove it's doing the thing it was intended to do. We're just going to shy away from it. But you're finding ways to just do it, which is amazing. I think I was thinking about this before. I think when people see AI and they've experienced AI and they've been spending time on GPT and they get weird responses sometimes or they get hallucinations and they think, you can't just sprinkle AI magic dust and solve these problems. There's a lot more involved in a proper AI solution where the whole solution isn't AI. Like a bunch of these components are purpose-built, intended to be in many cases deterministic. And the probabilistic nature that comes with the AI helps to be the connective tissue with all of those components to help make the translation layer for the human a lot easier and the output a lot more consistent across these systems. Yeah, I saw someone post on LinkedIn. They said something like, how is everybody convincing their customers it's okay to get unreliable responses from an LLM? And I was like, whoa, that's not your job. Your job is to make the responses from the LLM reliable. Like, I was like, I think you're missing AI product development skills here. And I think the other thing that people fall into the trap of, sorry to interrupt you, but I think people fall into the trap of like, why isn't this doing the exact thing my traditional system should do where I'm getting specific answers? We're not, we shouldn't be just comparing these agents to these systems. We should be comparing them to humans. Humans make errors too. And the idea is how can you get an agent to be, to do, to have less variance of errors than a human would doing that same job? How can you accelerate their ability to be more accurate with agents? I think, and I think as an industry, there's still, there's obviously teams that are building this stuff and using this stuff and learning these techniques of evaluations and measurement and feedback loops and guardrails. But we still have a huge amount of the industry that like their notion of AI is still just chatting with ChatGPT. And they don't understand these mechanisms exist. And that's actually one of the big reasons why I wanted to start this podcast is how do we just educate everybody? You don't have to just release the first AI answer. There's more we can do to make this reliable. All right, let me ask you this. What is next for Metable and your AI agent studio? Yeah, so I think you heard throughout the podcast a lot of the next in terms of like tactical development and evolving of the platform and the solutions. But our CEO just released a paper called full self-driving. And essentially, like that is where we're trying to go for Metable, which is like the full self-driving of clinical trials. So we really want to reimagine the space and we want to leverage the technology available to do, but essentially instead of having all of these manual operations that require masses of humans to be able to monitor single data points, we want to enable that full end-to-end clinical trial process that has these agent workers helping them along the way. Today, there's 10,000 uncured illnesses and they'll take 200 years to get to market at the current pace. So I don't think it's a matter of having less humans to do these clinical operations. It's a matter of enabling the humans to do more clinical operations so that we can get treatments to patients faster. There's an evolution to get here and we're starting with kind of specific use cases that we can see obvious problems that are fit for our agent platform, but we want to evolve that by, we understand the key problems in this space and build kind of agent-powered applications that have the full end-to-end workflow for each of the relevant personas that helps them get to that place of kind of full self-driving for clinical trials. Yeah, I like that. It's such a good problem space and just creating value for humanity, which I really appreciate. It's such, it's very refreshing to hear people use, especially this type of technology, for genuinely hard human problems. So keep it up. I have really enjoyed learning about your product and what you're building, and I appreciate you taking the time to spend time with me. Thank you. If you enjoyed this conversation