Hugo Alves - Let's Get Real About Synthetic Users (with Hugo Alves, Co-founder @ Synthetic Users)

Overview

This episode explores “synthetic users” as a new approach to product research: using generative AI to simulate qualitative interviews with target audiences and synthesize findings for decision-making. Hugo Alves, co-founder of Synthetic Users, argues the goal of research is risk reduction and better decisions—not rigid adherence to particular methods—while acknowledging limitations and the need for human validation for high-stakes calls.

Key Takeaways

A central claim is that teams often over-index on research process rather than outcomes: “help people solve their problems, not interview them.” Hugo positions synthetic interviews as a way to make research more accessible—especially in organizations that do little or no research—and to accelerate early learning, not to “replace humans.”

The conversation distinguishes realism in form versus content. Output that “sounds human” (slang, tone) can increase believability, but Hugo worries more about whether the underlying substance is right (e.g., what a demographic would actually prioritize). This framing clarifies why AI outputs can “feel fake” even when the content is directionally useful.

On the technical side, Hugo argues quality improves when the system breaks work into bounded tasks using multiple agents (planner/interviewer/critic-style roles), rather than “one-shotting” complex goals. He notes LLMs can be overwhelmed by competing objectives (e.g., persuasive vs. rigorous), and specialization helps reduce drift and improve consistency.

On epistemology, Hugo takes a pragmatic stance: whether LLMs are “intelligent” matters less than whether they produce valuable outputs with proper scaffolding. He references the view that next-token prediction can yield internal “world models,” and suggests trust is built similarly to other black boxes (e.g., Google Search): through repeated reliability plus evaluation.

Finally, the sharpest pushback he receives is moral rather than empirical—some critics believe synthetic participants “shouldn’t exist,” regardless of performance. Hugo claims many detractors haven’t tested the approach or lack falsifiable criteria for changing their minds, and he emphasizes ongoing comparisons to human studies and customer pilots as trust-building mechanisms.

Practical Steps

Use synthetic research as a pre-filter, not a final verdict. Start with synthetic interviews to surface themes, risks, and language; then validate the highest-impact uncertainties with a small number of real interviews (e.g., replace 10 interviews with 2 targeted ones for confirmation).
Define inputs tightly: specify who you’re studying (audience/recruitment criteria) and what you want to learn (research goal). Treat vague prompts as a main source of low-quality outputs.
Run multiple synthetic interviews, not one. Use volume to reduce the impact of any single “weird” interview that goes off-topic or hallucinates, then synthesize across the set.
Use synthetic comparison tests to narrow options (e.g., concepts, packaging, landing pages). Rank and shortlist with AI, then put finalists in front of humans—especially for expensive, high-risk decisions.
Establish a falsifiable evaluation plan: decide in advance what evidence would convince you it’s useful (and what would disqualify it), and test against internal data or past research where possible.

Notable Quotes

Hugo Alves: “At the end of the day, what we should worry about is building good products… We should help those people solve their problems, not interview them.”
Hugo Alves: “It’s always about reducing risk… research is about making better decisions.”
Hugo Alves: “Most people will say this doesn’t work. They haven’t tried it because they don’t want it to work.”

Full Transcript

Source: openai 1h 04m runtime

Most people will say this doesn't work. They haven't tried it because they don't want it to work. And at the end of the day, what we should worry about is building good products irrespective of the methods we use to arrive at those good products. We should help those people solve their problems, not interview them. So I think we overvalue the process in some cases. So some people don't even think this should exist. Hello and welcome to One Night in Product, the show where I chat to some of the brightest minds in product from across the globe to help you see product management in a whole new light. If that sounds up your street, don't forget to dive into the back catalog on your favorite podcast app or on YouTube. And of course, follow, share or drop me a comment or review. It all helps keep the lights on. Tonight, I'm delighted to welcome Hugo Alves, who I first met at a pub meet up in wonderful city of Lisbon before repeatedly seeing him popping up on LinkedIn, having poignant arguments with UX thought leaders about the merits of synthetic users, both the company which he co-founded and the general approach of potentially bypassing all those pesky, messy humans that use our products and letting AI take the strain. He's here tonight to talk about all of that, and I can promise that for this interview, we're both entirely human and there's not an LLM in sight. Hugo, thanks for coming. Welcome to the show. Thank you so much for the invitation, Jason. It's a pleasure to be able to talk to you and to explain a little bit of what we've built, how it works, the limitations and the potential of it. And most importantly, that it's not about replacing humans. It's about complementing them. Ah, don't jump too far ahead. We want to kind of keep the attention going, right? But I also understand that today, the day of recording at least, so 9th of February. So that will tell people how long it takes me to get episodes out. But today, 9th of February, third year anniversary of the company. Is that correct? Third year anniversary of the product, the company or the idea was kind of existing a little bit before we had to build it. But the 9th of February, 23, was the day that we put the first version of the product out, time flies, they grow up so fast, as people say. Yeah, they'll start insulting you just like the kids do soon as well. But like, that is an interesting point though, because, you know, we'll talk about some of the methodology and stuff in a minute, but kind of LLMs and sort of AI landscape in general has moved so far in those three years as well, right? So like, kind of feels like whatever it was that you were putting out back then, I don't know how long it took you to build it, but like, whatever you put out as kind of version one, like, maybe do you even kind of look at that and think, oh, yeah, well, it's a bit kind of funny now compared to what we have today. Always. I think that is something that if we are making progress in terms of most of the products, our own careers, all of that, we always look back and cringe a little bit, but right, because the expectation is that things have improved and we've grown, all of that. So yes, I think if I had to use our V0, I would be cringing a lot because I would be like, oh my God, it still has this mistake. It still does this that it's not supposed to do. A lot of the early criticisms we got were amazing because of that, because it allowed me to understand where the product was failing and try to improve it. And of course the space itself, but the generative AI, what happened in the last three years, oh my God, we went from just text completion to instruct models, then suddenly we unlock images at a quality level, then it's like, what happened? So there was a lot that changed in the field and at Synthetic Users, we always try to incorporate all the findings as well as we could into the main product. Well, let's talk about then that product and of course, Synthetic Users, the company as well. So obviously the product and the company and the approaches are kind of interlinked into one, but there's also just a general concept of Synthetic Users, which we'll talk about in a bit. But why don't you, for the record, let my listeners know kind of exactly what it is today that you're building out there and kind of the approach and the, well, not necessarily the approach, but like the job to be done. Like what is it that you're solving for people? What we're solving, so what we're solving and the job to be done can be defined at several granularity levels. So I could be, we're generating Synthetic Users, so users that are based on generative AI that mimic real people, but that would not be a job to be done. That's more of a task, almost. So what, the way I see this is we help people make better decisions, because research is always about making better decisions. It doesn't matter if it's desk research, it doesn't matter if it's primary research, it's about reducing risk. It's about having a clear understanding of the people we're building for, of the market, of the business. So that's the goal of research. It's not about a specific methodology. It's about reducing the risk that is involved in creating new companies and creating new products. And user research was always an essential component of that. We need to understand the people we're building for. We need to understand if the problem we think exists really exists. We need to understand if it's painful enough for people to have any motivation to solving it. We need to understand if they're willing to pay, if we can make a business out of it, all of that, all the risks that Kagan talked about. We know that as a group of people, as practitioners. So it's always about this. It's always about reducing risk. And I remember making a joke with Kwame, my co-founder, when we launched that we were a waste reduction company, because so many products are launched with zero research. I know researchers like to point that synthetic users are kind of killing the field because we were removing the value and undermining research. And I'm like, dudes, most companies don't do research at all. So if they have, if they can get out of their own minds, if the CEO can get out of their mind and do any research, even a synthetic, that would be probably already an improvement. And I just need to point to Fire Phone, that Fire Phone was a huge failure because it was driven by Jeff Bezos and his understanding of what would make a great phone and not really done around synthetic users. So I think that's essentially what the product does. But to be more transparent as to what is the deliverable, we generate interviews, qualitative interviews, in-depth interviews using generative AI that mimic what people in particular groups would say in depth as an answer to that question. And then we synthesize that. And now we even have an agentic approach in which we help you design what research should be done. Nice. Well, there's plenty to dig into there, which I promise that we are going to dig into. But I wanted to tease one little thing on the side, because I know that you've got somewhat of a psychology background yourself. I know that you have a master's degree in clinical psychology. So I guess you spent at least some of your time in the past poking around inside brains and trying to work them out. And you've talked, obviously, about effectively simulating personalities, simulating minds, simulating the types of responses that people might come up with if they were those types of people, but kind of doing that using LLMs and doing that at scale. But there's obviously then there's almost this other question, which with your clinical psychology background, maybe you can give us a bit more of a kind of a steer on this idea that, well, obviously, LLMs aren't human. They're not human brains. But at the same time, there's a lot of debate around whether they're intelligent, whether they're thinking, whether they have actually got any kind of concept of the world or whether it's all kind of stochastic parroting. And then there's another argument of like, well, actually, it doesn't really matter if they are or are not intelligent or conscious as long as the outputs that come from them are good. And therefore, you know, you can still get some kind of use out of them. So where do you kind of stand on this? Maybe not consciousness, but like intelligence versus really useful automation debate when it comes to just the general concept of LLMs versus minds? OK, so Ilya Shushkever, one of the original researchers behind the Transformers paper, worked at OpenAI, ended up leaving to start his own company. And someone who has thought deeply about what do these models do, he was having an interview, he was doing an interview with Jensen Wang, so two really important people in generative AI. Jensen Wang, the founder of NVIDIA, responsible for building the hardware in which these models are trained and run. And then on the other side, one of the guys responsible for coming up with the Transformer architecture, essentially what unlocks the generative AI wave. And Jensen Wang asked him, so what are these models really doing? And Ilya, I'm not going to paraphrase because I don't recall it by heart, but Ilya just what he tells is that by just training these tokens on a really dumb task, like predicting the next token, the process, this process of training forces the models for them to be able to accurately predict the next token. They need to some level of fidelity to have an internal model of how the world works. They need to understand in some sense, understand might be too much of a strong word, but they need to have some representation of what people are, of what the company is, of what the street is, of what the sun is, all of that for them to be able to predict the next token in an accurate way. So what his argument is that they have internal world models and what we gain with the scale of growing these models to larger and larger size is how accurate those models are. And that's a little bit of how I think about this. There's more stuff written about this, but at the end of the day, although I'm going with this, starting with this more theoretical argument, I'm a pragmatist. If it helps, I don't care what the process is. I don't care if they're simulating mind, if they're not, my question is, can I build, can I create valuable outputs from those models with some kind of scaffolding system that helps wherever I want to help or helps me in my personal case, because I use it, I use these models personally, I build my own tools. Can they help me? That's the question. That's, that's for me, that's the litmus test. That is what I want to see answered is, does it help me or not? If the process is not exactly the process that happens inside the human mind, I'm okay with that because as long as it's better than the one I can simulate by myself, I think that is already quite helpful. So I'm a pragmatist. Yeah, it reminds me of another friend I had on a podcast a few, well, a few months, probably years ago now, and we were having kind of this debate as well. And one of the things that he said that kind of still sticks with me is this idea that, you know, it doesn't matter where the words come from as long as they're the right words. And that has stuck with me to some extent now, because, you know, I, on the one hand, I would sit there and argue that it does matter, like it matters because otherwise, how do you know that they're the right words? Right. But I guess, you know, a lot of people now are advocating that with the right kind of evaluation frameworks and like you say, sort of guardrails scaffolding around it, that you can kind of make it much more likely that they are going to, you know, be the right words. So it's kind of an interesting argument, like, you know, outcomes over outputs, right, like, you know, it doesn't really matter what you do to get to the outcome as long as it's the right outcome. But I guess it just feels somewhat, I don't know, even that kind of morally dubious to sort of not understand why it's saying, because we understand why people say things right for the most part. Like we can imagine someone saying, well, no, no, no, no, no, we don't understand, no, we don't understand like necessarily the internal, you know, like the neurons and which one we can't necessarily predict it, but we can genericize from all of the people that we've ever spoken to, and we can anticipate certain types of response. There'll be fairly reasonably accurate across time. Cause you know, people have fairly predictable biases and such and everything else, don't want to get everyone, but it's going to kind of be a quite a good generic model. Whereas with LLM, sometimes they can just go wrong in a completely random way because they just got taken down a path and hallucinate to the end. And that's part of the model, but at the same time, you can't explain it. And even the people that create the LLMs can't explain it like that specific behavior, that specific path that it took. So I guess that's the, that's where the, maybe the, it's not even necessarily a technical argument, but it's a tension because people can't explain what's going on. It's a black box and they, they can't, they're not an LLM, so they can't understand why it made that decision. But I guess that's an interesting question around like making sure, you know, everyone talks about LLM hallucinations, for example, like making sure that they're as accurate as possible. I know you talk about rags and stuff as well, but like, how have you tackled that within your approach as well to make sure that you've not just got something going off and running God knows how many interviews and then coming back with just complete garbage because it took a wrong path. Okay. So let me just go a little bit back to, to something that you said, which is this discomfort with not understanding with good accuracy, what is really happening there. It's just because this technology is newer that we have more discomfort with that because no one knows exactly what goes behind Google's algorithm. So there's an entire field, an entire profession around that, SEO, SEO, people working SEO, essentially trying permanently trying to reverse engineer how Google works, but me not knowing how Google works doesn't make me trust Google less. It's just, I'm okay with that because I've used it frequently enough and I've got good results frequently enough for me to be able to almost say, I don't care about the process, it's helping me. So I think it's also a little, it's also a transition phase. This doesn't mean that the fact that we're simulating humans doesn't have in itself a different moral flavor. I think the fact, one of the reasons why synthetic users creates such strong emotions is because we're simulating people. If we were simulating, I don't know, companies, this wouldn't be the case. I think the fact that we're trying to simulate people creates some moral discomfort in general with a lot of people, which is completely understandable. So my tweets announcing synthetic users three years ago was me, we've created synthetic users, damn, you must be fucking stupid. So we were quite aware of the moral, as you said, humans aren't predictable enough, as long as we don't trust our intuitions too much, because naive psychology can also lead us astray, humans aren't quite predictable enough. And I think the same way that we build internal models of how human works, and we both have beliefs about what emotions are caused by what events and all of that, I think the same happens with these models, because they've read the entire internet, almost the entire written history of humanity. And that's why they extrapolate from that. Having said this, how do we address this in synthetic users? So you mentioned quite correctly that we generate a bunch of interviews. One of the reasons why we never went with just generating one interview and giving it to customers or studies kind of force people to run more than one interview is exactly because of what you mentioned, because that one interview might go astray and go to a weird direction. That happens with humans, that if you've run enough user interviews, you know that sometimes after one, you're like, what happened there? I let myself go with the participant and that was not at all the goal. And the reason why we do this is because we tend to, if we run a bunch of them, even if one goes astray, the other ones kind of compensate and stay in the center of gravity of the topic. And that's one of the reasons. The other one is that we regularly run human studies and we compare them with synthetic studies, and that's essentially how we try to keep our system accurate. And also, we've had our customers do this a lot. The big customers that we have, they come to us, they don't start using synthetic users immediately. They're like, okay, I want to run a pilot. I want to see that against the data I have that you don't know about. And in a lot of cases, what happens is that our system is tested against data that I never saw, that I don't know what it is. And customers come back and say, Hugo, let's set up a contract because I'm really happy with the results we got. And again, it's a matter of building trust. We don't expect anyone to make a decision that can make a company break using just synthetic, not at all. No, I wouldn't do this. I don't recommend anyone to do it. It's about figuring out where is the sweet spot of acceleration and clarity, and what should be left exclusively for humans. We, in no way, we believe that humans don't no longer have a role as participants and as people. We want to understand deeply. It's just that there's trade-offs to be made and sometimes speed is more important than having 100% accuracy. Well, that could be controversial to some, but again, we're going to come back to some of those topics, but I'm really curious as to, you talked about, for example, running those pilots and obviously just people coming in and just using your tool, what is it that is the input? So imagine I signed up today and I want to go and do some synthetic user research about whatever tool I'm going to go and build, now I could imagine that it would be something like, if you look on Twitter slash X these days, where half the posts are just like, Grok, is this true? Like, it's just that kind of, you know, like just mashing the keyboard and hoping it's going to give me an answer. Or, you know, maybe I'm like uploading a bunch of documents or maybe I'm going to go and back and forth and co-creating research plans and stuff like that, like what's the actual sort of front end interaction? Like if I'm a user that wants to come in and create a research study, like what, what do I need to give it for it to give me a good kind of result back? Right. That's a great question. It gets to the nitty gritty of how does the tool really work. So yeah, that's this question and the next question. So, you know, there's, there's two, there's two things that almost two more modes of operation in synthetic users. So on the one side we have kind of an assistant that helps to flesh out the research plan, that helps to flesh out a well-defined audience, a well-defined almost recruitment criteria, and then a well-defined research goal. So what do you want to, who and what, who do you want to learn about and what do you want to learn about those people? And this is kind of the, what we call dynamic interviews. It's our most encompassing product. In this one, you have an assistant, but if you want to jump the assistant that just you have already a well-defined audience and a well-defined research goal, you can also skip it and it's essentially two text fields. But this is kind of the bread and butter of synthetic users. We then have more concrete types of interviews for more concrete use cases. If you want to test, let's say a landing page that has a specific visual layout, you can upload images of that landing page and you can run a test with specific questions for that one. If you want to compare different concepts, let's imagine you have 12 different piece of packaging options that you want to try out because you want to filter out the last seven and just have the best five go to humans. You can do that. We run comparison studies in which we generate synthetic users for each of those packagings and then we summarize the results and we give you the rank of them. So there's a lot of flexibility. And now with our new agentic approach with Iris, it's quite easy for you to explore a lot more, not only the main audience you were thinking about, but maybe adjacent audiences that can also contribute. So essentially, it's who and what we want to learn are the core inputs for any type of study at synthetic users. Well, it's interesting you talk about the agentic stuff as well, because again, I can imagine, for example, that you take some of that input that you talked about, that then you've just got some cool prompt that you've kind of written in the background that just inserts that text in the middle, sends it all off to the LLM of choice, and then just kind of comes back with a nicely formatted report. But, you know, you're talking about agents and I was poking around a bit before this, you know, try and work out a little bit about what was going on, send some stuff about agents and planners and interviewers and critics and kind of ensembles now that starts to sound very exciting and cool, but it does kind of then just make me just ask that question, I guess more is like, how are you then using that adjentic approach? And of course that's an approach, you know, the word adjentic itself has lots of different meanings, depending on who you ask, but how are you using that adjentic approach to, I guess, ensure, I guess for want of a better word, quality like, and I guess also realism, because again, this thing feels like it needs to feel real, like if it doesn't feel real, then, you know, your people doing pilots are not going to come back, right? So how do you make it feel real using that kind of swarm or sort of ensemble approach? The, the, the realness, the believability of the outputs is more core to the discussion that I would like to. So essentially I see, I, since, since the early days at Sideric Users, I see the, the, what we do in two axes. So one is form and the other one is content. And what I mean by this is essentially form is if you ask me to generate a Portuguese, a 12 year old kid, does he uses the same slang that a Portuguese, a 12 year old uses or not? That's the form. I don't even, I don't trust you to use the same slang as I do, from my experience in the UK, I don't trust either of us to make, to use 12 year old slang. Not at all. Not at all. Not at all. I have a little bit of Peter Pan syndrome, but that doesn't make me a 12 year old. So that is, but that is the form. The core and the other axis is the content. So if I ask a 12 year old, what are your top five games nowadays or your three top platforms, what, what does he list? And that's kind of independent of how it's written or how it's expressed. It could be just a bullet list, but if, if the answers for my synthetic 12 year old were a bullet list, people would, would not, would not look at it and feel that it's real, although. It just feels like an LLM output, right? Exactly. Although they could be almost the same that, that the 12 year old said. So there's an element of form that is also relevant, but for, for me, historically in the work that we've done, it was always about the content itself. I worry about the form, but the form is secondary to the content. If the content is not, if the five games are not the same that a 12 year old would say, ignoring the variability between 12 year olds, all of that, but if it's not the same, that's what, what's, what doesn't let me sleep at night. If it, if the slang is slightly different than it should, I'm not that worried, particularly because it's not a 12 year olds that will read it. So it won't be able to know also the slang. And the reason why we went with a swarm, a sub-agent approach is because one thing that I think it's kind of clear nowadays, and, and it has been almost reified with the model providers, you see quad codes uses sub-agents, so there's a lot of tools now that implement formally sub-agents is because LLMs get overwhelmed and I'm going to anthropomorphize it a lot. It's not that I believe they're human, but when Richard Dawkins wrote The Selfish Gene, he made the disclaimer in the beginning in which he said, of course, the gene is not selfish, because the gene is not a person. But if you think as if it were, it facilitates a lot of the, of the, of the way we look at that phenomenon. The same thing applies here. I'm not saying that they are human, but if you think about it as if it helps a lot figuring out stuff. So if we give them specific tasks with bounded context and bounded dynamics and bounded goals, they perform a lot better than if you ask them to do a lot of different, sometimes competing goals. Sometimes you want it to be persuasive, but you also want it to be rigorous. And these two incentives sometimes are not aligned, but if you have two different agents, one to make it persuasive and one to make it rigorous, it works ends up working better. And that's the reason why we went with the sub-agent swarm approach is essentially because allocating specific tasks to specific agents and giving them the right tools is the best way to get high quality content from them. Yeah. It's funny because I've done training for people in LLMs, you know, LLMs for product management and stuff, just, you know, workshops and you get them to try and do, you know, whatever task, you know, synthesize some research or do some market research, do some competitive analysis, whatever, like traditional LLM tasks as they, you know, it's funny to call things traditional these days. I've not been that long, but it's funny because the typical, I guess, instinct of people, like even just people using the LLMs like manually is to try and just almost one shot everything, right. And you see that with vibe coding now as well, like people trying to one shot entire applications, one shot at Salesforce and it's all perfect, et cetera, but yeah, in my experience and obviously from what you've said, from your experience as well, the less you ask an LLM to do, the much more likely it is to do it properly versus getting confused or losing context or drifting or hallucinating doesn't mean that it guarantees it, but it just means that you're in a situation where you're more likely to succeed. So I think that's, you know, it's kind of fundamental and kind of interesting approach that you're using. But I'm also curious though, about, we talked about it a little bit earlier, this idea of like simulating humans versus simulating companies. And I'm going to speak to another podcast guest a while ago around kind of behavioral economics and, you know, talking about, uh, cognitive biases and things like that. And biases, heuristics, rules of sound. Just, just wondering if, uh, if it were possible to almost like model companies versus modeling companies as collections of people, because obviously all companies are made up of a bunch of people like there's, you know, 50, a hundred thousand people working for the company and each of those people kind of sum up to this kind of almost meta concept of a company. And the only reason that I kind of bring this up is, you know, my world is very much B2B. I do a lot of B2B product stuff. A lot of, all of the companies pretty much are either B2B or B2B2C. So like, it's that kind of, it's different. It's not quite the same as like going out and asking a bunch of 12 year olds to your point of, you know, their favorite games or like what would be a new activity for them, or even just asking a bunch of 25 year olds, like what kind of dating app that they want in the future, like companies just have almost like a sum of desires that kind of add up and, and maybe even a different element of predictability. Well, you absolutely, but they have a different element of predictability as well. Like they're always going to want things that drive up certain numbers or drive down certain numbers. There's always going to be certain tensions between kind of security and speed and all these other things. I just wondered if you'd had any kind of experience or failed experiments where you tried to almost model the meta concepts of a company and kind of synthetic that versus kind of doing the individuals within it. And I guess ultimately, by extension, whether your approach works equally as well with sort of B2B and B2C, or if it works better for one or the other. Let me get the last question out first. It works for both. It works really well for B2B. I think most, most of our, at least our largest customers are in the B2B space. So it works well for both. There's slight differences. I can go a little bit deeper, but I think, I think your initial question is, is a lot more thought-provoking and interesting. So I want to spend a little bit more time on that. Also, because there's something, there's a funny thing here, which is before synthetic users existed, before we even decided that, hey, let's build synthetic users, I joined forces with Kwame. We did it, so we did it the wrong way. As product person, you're going to look at me and you're going to be like, Oh, that is really the wrong way. Which is, you never start with the technology. You always start with the problem, then you figure out how to solve it. We did it the wrong way. We wouldn't, we looked at, I looked at GPT-3 and I sent a message to Kwame. Hey, have you seen this? And I sent him a couple of examples. We were friends for a long time and he's like, Oh my God, this is not possible. Hey, come to the office. So what we decided to do is let's figure out where we're going to deploy this new thing that we think it adds a huge potential to help product people build better products. And we did a lot of stuff. And in the beginning we had the framing around, or Kwame had a design agent, a designer development agency focused around planet centric design. development agency focus around planet centric design. And essentially is the idea that you should be building products taking into account not only your end users, but the communities around them, the impact on the planet, all the supply chain stuff, all of that, you should be more aware that a product, it's not only for the users, it has marginal impact in the whole world. Just think about social networks, social media. And one of the things, one of the experiments we did, and it had even a code name, it was called Captain Planet. And essentially it was using large language models to get the planet on the table. So every time you had, you were considering a feature, considering a product decision, you would go to that app and you would describe what you were doing, and the planet would reply to you and say, oh, I think that's a good idea for that, but I'm worried about the impact on the forest, because my forests are really delicate and you're going to do this, that implies this and does that. So we didn't go for companies, we went for the entire planet. It's funny because Apple ended up doing a commercial in which Mother Earth was at the table. The interesting thing about what you said regarding companies is that it's one of the things that we haven't done at Cedric is because it has its own particular level of challenge, which is mapping out the dynamics. Because... Yeah, like the network within them. Exactly, exactly. Who influences who, what's the critical threshold for a decision to be made, how many people do you need? You know the things that PMs do. Before you go to a meeting to pitch something, you do, you talk to everyone beforehand, you align stuff. So those dynamics are kind of the next level. I think mapping out the emergent properties of human relationships and decisions, taking into account how we communicate, what channels exist, what previous relationship exists, that is something that I think is a very, very, very interesting challenge that we haven't. We have focused on individuals for now, but that I think is going to be, that's a really, really cool thing still to be done. Well, plenty of stuff to think about there, but again, I agree. As complicated as it is to kind of map individual people, like mapping networks of people just, you know, just sounds hard and there's so many different dynamics. There's a really cool book from Philip Ball called Critical Mass. And essentially it's around this idea that there are some stuff regarding people. When we do this, when, for example, when you're planning the exit route on a stadium, on a supermarket or whatever it is, in which you know you're going to have a lot of people, the way they do it is they treat people as if it was a gas or particles. Brownian motion, right? Exactly, exactly. Because at a certain scale, we can almost abstract the individuality of people. So in some sense, it's already done. When you study stuff like what songs get to the top of the charts, there's also dynamics like that. So it's a very, very cool problem to think about. No, absolutely. We all like thinking about cool problems. But I imagine out the back of this, so like, you know, I've put my stuff up the front. We've sent it off to a swarm of agents and they've done whatever it is that they do. And then it comes back and I've got, I guess, some kind of report that I can use, you know. So I've got some kind of, I think you mentioned, for example, a bunch of then sort of interview transcripts, some kind of summary or roll up or something, which is, you know, a very traditional LLM kind of use case, I guess. But what do I do then? So like, I've got my report. It says whatever. And of course, that's then based off of all of the different interactions with all of the different agents and personalities and whatever you've defined. Do I just then just go and build some stuff off the back of that? Do I go and speak to some real people because I've found some better things to talk to them about now? Do I use that to inform further research? Like, what's the next step after I've got my kind of synthetic user report back? All of those are valid. It's a lot easier to think about synthetic users if you don't make a distinction between synthetic and humans. If you look at it, if I had run a bunch of user interviews, and I got a report from my user researcher that points this or a Figma make file with a lot of quotes or a presentation, what would you do? It's the same thing. With the caveat that in this case, you know that it's synthetic stuff, so if you're making a big decision, maybe you have this hint from synthetic that this is the rank of something, go validate it with a couple of interviews. Instead of running 10 human interviews, run two. Or it can be just the input for the human stage. It's just, okay, we got a sense of what are the main worries of this demographic regarding this topic, but now we just do two of them that we had never thought about. Let's do some human interviews or let's run a survey. So essentially, it's the same thing as if you did some desk research. If you do some desk research and you see that on that subreddit, people keep talking about this challenge, would that be enough for you to build a product? Probably no, but if so, treat it kind of the same. This is the challenge of a product like this. People are still figuring out exactly how to fit it into workflows. And that is something that I don't think we've done an amazing job at because we're also discovering it. We use synthetic users a lot to build synthetic users. There's features, there's product names that were done by synthetic users. In some other cases, there was stuff that I saw from the synthetic could be interesting and then I talked to a couple of humans and I figured out something adjacent that was even more interesting. So it's still a green field and I think we're still figuring out how to best leverage these capabilities and I think this is, in general, AI right now. We're still figuring out. Well, it's funny because if I rewind a few years to one of my previous jobs before I started going and consulting on my own, I worked at a company that was using AI, pre-LLM AI, although we were certainly doing some experimentation around transformers at the time, but the technology hadn't moved on to obviously where it is now. I kind of would kill to have what we have now back then because I think so many different ways we could have used it. But we were very much at the time consuming, for example, social media data, Reddit data, news data, a bunch of just as much of the internet as we could. Social listening kind of. Yeah, kind of social listening. Social listening, but rather than social listening for kind of brand awareness or whatever, we were very much trying to kind of categorize and predict the future. In that case of kind of fast-moving consumer goods, like sodas and chocolates and sweets and washing powders and stuff like that, the basic goal was to use latent conversation to kind of feed models that could then categorize and predict the direction of travel. So, for example, you could sit there and say that there's some new trend which such and such a company should take advantage of because people are talking about it. And they're talking about it naturally on their own. They're not talking about it because you've asked them to talk about it. They're not talking about it because you've gone and interviewed them about it. They're just talking about it because they're talking about it. Now, of course, those are still humans talking about it, but there is still some similarity to this idea of what we're doing there is we're almost taking the training part of the LLM side. Like when they're creating an LLM, they're taking real conversations. They're synthesizing models off the back of those conversations that are genericized across all of the different data points or whatever, and then they can kind of crank out data and you can kind of come up with reports and stuff. We weren't quite doing the same thing, but there was still this idea that you could get enough meaning out of kind of latent conversation that you could make some decisions off it, but you certainly wouldn't bet the entire company on the fact that a few people on Twitter were talking about banana coke or something like that. That's just a data point. I think that's an interesting kind of point that you made around sort of almost like desk research versus longitudinal ethnographic studies or whatever. It's this idea that, well, it's just another type of desk research. Is that really how you're positioning it, or do you think that that kind of undersells the impact that you can have from it? I think it undersells because it ignores one important aspect, which is the generative part. So with desk research, it needs to already exist. You need to find it, but it needs to exist already. And the reason why I mentioned that comment earlier from Ilya is my belief, and let me reinforce the word belief. My belief is that these models are internally creating a model, as I mentioned, a world model, so they know what kind of entities exist in the world. There's the sun, there's these companies, there's Apple, there's IBM, there's Portugal, there's Riesman, there's Plumbers, there's all of that. But they are also doing something else, which is they're creating a model of human nature, almost. They are, that's why I can give them some kind of a scenario that has never been written online. It's not in the training data. And I describe a scenario. This happened with a person, and that other person said X, and this person did this, and someone else came in and said X. And I ask him, how does person A feel? And it can really accurately say what at least is my understanding of what person A would feel. And that is the fundamental cool thing about this, is that it doesn't just match whatever was written, but can infer something more from that and can apply it to a different situation. So that's why I don't think, just kind of like desk research, which is something that people describe us as, when they want to say bad things about us, but they still need to concede that there's a use case. They're like, oh, yeah, it can be used maybe as desk research. And I'm like, yes, it's desk research, but for research that was never done. But it's interesting though, because you kind of mentioned inferring there. And of course, everything you say is true. I'm going to ask an LLM right now, what would a Portuguese man in Lisbon say about it? And it would give me a response, even if no one had ever asked that question before. Because LLMs love to answer questions as well. That's what they're optimized for. They can't stop themselves from answering, which is a big challenge. Yeah, no, exactly. And from my own kind of work and experimentation and building stuff with LLMs, I have exactly the same. They're just so goddamn helpful, right? And sometimes you don't want them to be, and especially in the situation where you want, for example, to simulate a user or a research participant. Most research participants aren't going to be sitting there trying to design you a canvas or something off the back of a conversation. But so getting LLMs to not do that is obviously a constant battle. But on the other hand, you mentioned sort of how you can kind of generate new stuff, but there are strong arguments out there from fairly credible people that there's kind of some interpolation versus extrapolation, right? In the sense that they're not really coming up with new things that are actually new based on any knowledge or worldview. They're coming up with new things based on the fact that somewhere across their training data, there are enough similarities between other stuff that they can kind of sort of almost remix stuff and come out with new things that look credible, could even be credible, but aren't necessarily based or grounded in truth. Now, I'm not going to say that that's right or wrong, but that's a very common argument against using LLMs to generate new stuff. And exactly the same argument of, for example, why you couldn't ask an LLM to build you a cold fusion reactor because no one's built one yet, and therefore it can't tell you how, but it can certainly tell you some of the stuff behind it. So how do you kind of push back against that kind of interpolation versus extrapolation? Like it's only just remixing stuff that other people have already said, and maybe it's right. Yes, I think so. First of all, I kind of agree to a large extent that that is one of the things that LLMs do, for sure. I don't even question that to be true. I think the continuation of that is just that if they're doing that, it gives you more confidence in the output that they're giving because it is pattern matching, and that's essentially what most of the times we want. But I'm also going to argue something else, which is not with large language models, but that idea that they can only reproduce what's in the training data, it's not true. It has been proven with AI with a quite public-facing case. So I think it's move 34. Let me just quickly confirm. So AlphaGo was the neural network that was trained to play Go, one extremely complex game that from really simple rules becomes really, really complex. And when AlphaGo was playing against Lee Seedorf, which was the top guy in the world at the time, there was this moment in the game, in which the LLM AlphaGo did the move 34. No, 43. No, move one of those. There was some move. There was a move. There was a move that no human would have done. Everyone said, what is happening? Why is he doing that? Everyone thought it was them. And only, I don't know how many moves after, people understood how much of an outside-the-box move that was. That was not a move that had been done by humans. Lee Seedorf at the time was like, I don't know what happened. So the common criticism is that this is just remixing. And again, I don't think that is even a bad thing. Most humans are just remixing anyway. We think of ourselves as more special. I can certainly think of a few who are. Exactly. And that's one of the things that kind of bothers me about some comments about Titanic users is that I feel that I've read that comment enough times that I'm like, humans are also quite repetitive. So I do think that they might be able to fundamentally come up with new stuff. At the same time, in my case, I don't want them to come up with new stuff. I want them to reproduce humans as much as possible. And this is a really important one. With all the variation that humans have. Because I do believe there's kind of a core human nature, core emotions, core dynamics. That's the reason why we can read the book that was written 2,000 years ago and still empathize with that book. Is exactly because there's a commonality to the human experience. But I also recognize that there's a huge diversity in human experience not explained by that commonality. And it can be environmental factors. It can be developmental factors. And that's what Titanic users are supposed to capture. So this is my answer. A little bit convoluted, but the best I could come up with. That's fine. I'll listen back to it and pick the bones out of it later. But you talked a little bit about criticism. You talked a little bit about pushback. And there are lots of UX and research thought leaders out there that are very vocally calling you or at least your company or very specifically the kind of the methodology of creating synthetic users. They're kind of saying that it's a bad thing. It's an undeniably bad thing. Can't be defended. It's unethical. It's just a complete waste of time. All these different things that certain people don't like. They call out issues which are actually quite important. Like you just mentioned around, for example, kind of almost commonalities between users. But of course, that's only going to work if the data is rich and diverse enough to actually cover multiple different perspectives. So, yeah, there's a lot of debate about AI bias. And certainly, we've all seen examples of that. And we talked earlier about how kind of LLMs can be sycophantic. They could be tempted just to reinforce whatever it is you said to them. Or, oh, yeah, that was a great idea. Thanks for telling me that. They might just not capture the actual complexity of human behaviors because humans might act in a different way to how they say they do. And there's not necessarily so much or so many different ways to kind of validate kind of what people said versus what people did. So, yeah, you've talked a bit about some of the objections that you hear. But do you have like a – or have you – I guess an interesting question, actually. Have you managed to kind of win someone round? Like there's a very loud group of people out there that have got lots of what on the surface look like very well judged and kind of valid arguments against using them. What are some of the arguments that you've managed to maybe push back on and maybe even convert people that actually that's not as much of a problem as it could be? Relevant to this discussion of the criticism is something that I mentioned a little bit before, which is essentially it's a moral argument for some people. And there's no convincing in that case. There's no evidence. There's no evidence that would change that person's mind. They fundamentally believe – believe being the operative word here – they fundamentally believe this shouldn't be done. It's not that this – some of them it's not even that if this works or not. It shouldn't be done. Because we're removing participants from the research and that by itself is a bad thing, which I disagree. Otherwise, we wouldn't be paying people. If people enjoyed being part of research, we wouldn't have to pay them. Most people are too fucking busy to be sitting 40 minutes in Zoom and answering questions. They have their own lives to do. And at the end of the day, what we should worry about is building good products for those people, irrespective of the methods we use to arrive at those good products. We should help those people solve their problems, not interview them. So I think we overvalue the process in some cases. So some people don't even think this should exist. Then there's the other one, which is still on the moral discomfort, but then it translates to skepticism. But in most times, it's fake skepticism because it connects a little bit of the way we humans do about stuff. Because it's not skepticism because I went to test it and the results are bad. It's skepticism because I don't think it works. It's not that I've tested it. And that's why my brand on brand comment on LinkedIn in most cases is, have you even tried it? Because most people who say this doesn't work, they haven't tried it because they don't want it to work. They're not willing to do the experiment that could prove them wrong. Or if they do, they do it as we humans do, already in a biased way to get the results I want. We all do this. One of the things that always worries me is, am I testing stuff in a way that is self-serving? Because it's just natural. Because I built a website that has 250 papers that in some sense test this idea of how well, how good are LLMs at mimicking humans. So all the academic research, non-incentivized, because I don't know those people. And in some cases they say it doesn't work. In some cases I can read the study and I can understand that the researchers themselves went to that experiment not wanting it to work. They use a really bad prompt. They use a really bad model. And then they say, hey, see, it doesn't work. And that's why synthetic users, we always said to researchers, if you want to use the most recent models, we'll pay for that. If you want to reproduce your paper with the top state-of-the-art models, we'll pay for those tokens. So what really bothers me is the lack of evidence. All the times that I was able to convince something or change someone's mind is they have to have a criteria to change their minds. That's why in some cases I also ask people, what would be the experiments that would change your mind, that would make you believe that there are some cases in which this works? And sometimes people are dumbfounded by this because they hadn't thought of what is their criteria to change their mind because they hadn't thought of changing their mind. But if they are, there's loads of ways to explain. And most people, the ones that change their minds are like, I wouldn't replace humans with this, but now I can really see where this could help me. This could help me here, this could help me there. So the only thing that matters is if the person has some criteria to be flexible and change their mind. But you mentioned that you do kind of get stuck in, especially on LinkedIn. I'm assuming on Twitter, if you're still there as well, like if people start to call it out, maybe they tag your company or maybe it's just an article that they've posted or some thought leaders post some new opinion about it. You're very keen and eager, it seems, to kind of get stuck in and to sort of state your case, much of which is similar to what you've said. Have you even tried it? Always being the kind of first example. I think you've even got some memes or some images that you use for that as well. And I can understand the motivation to do that because, of course, it's new. And I guess you feel also to some extent that people aren't giving it a chance, as you kind of pointed out, or that they're kind of themselves biased in different ways. But do you think, well, first of all, I guess, do you enjoy it? Like, is that actually fun for you? Or do you feel that it's just something you have to do? And second of all, do you think it helps or hinders the mission of the company or maybe even the reputation of the company? How do you see that or both of those things? So, in the beginning, I did take some kind of enjoyment out of defending what I built. There was something to it, I believe. But with time, it started to take a toll on me. I haven't been doing it recently. So is it effective? I think in some cases, I think there's some minds that won't change. And I assume that I've lost those. So when I comment, it's for those on the fence that if I wasn't commenting, anything there would just follow whatever the, I don't know, if you're one of the most respected people in product or in user research and in telling your followers that this doesn't work, even without presenting any evidence, that people will say that your opinion is evidence and it's not. And that's why I go there and challenge. And then if the person engages willingly in a norm, it really depends on also how they write it because I've seen shit written about synthetic users and about me and Kwame, the founders. I've seen really, really nasty thing. I remember two years ago, user researcher, thought leader, whatever you want to call it, that he called me a wanker without me interacting with him. And I'm like, really? You mentioned, funny comment here. You mentioned Twitter, you mentioned LinkedIn and then he said, oh, if you're on Twitter, maybe that's something. LinkedIn is the nastier one. Everyone says about Twitter being toxic, all of that. LinkedIn is the nastier social network I know. I've never been on 4chan, I imagine it might be worse, but LinkedIn, it's amazing that human nature built us to, in the place where our reputation is visible and our work life is visible, we can be really, really nasty people because I'm quite aggressive in defending synthetic users, but my recollection, I never went personal. And that is an important aspect for me and I've been attacked personally because I built something. What? So if it's the best way to approach it, I think I'm old enough to also be one of those, it's my thing, it might not be the ideal one, but it's the one that comes naturally and I stand my ground. I have noticed, but do you worry, like you talked about, for example, coming up against maybe some of the biggest names in the industry, big, well-respected people, not necessarily that you worry about them because, of course, not everyone has to respect everyone, but at the same time, that maybe by coming up against them, that it almost impacts your own reputation because many of your potential customers might be more inclined to respect, I won't name any particular names, but some of the biggest names in the industry over you who obviously have a vested interest because you have the company and it's something that you've worked on, as we know, for three years plus. So people are going to expect you to say what you say and, of course, we're going to expect your user research advocates to say what they say as well. But I just wondered if you felt that there was any risk in kind of butting heads with big, respected names or if you think that it's basically kind of fair enough and fair game. I think the challenge, I completely subscribe to the risk. We're both thinking about the same person. And I have a huge respect for that person. I have a huge, huge respect for that person. Both the frameworks, the way of thinking about product, all of that were part of my growing up as a product manager, so I have a huge respect. And that's why it bothers me particularly when people do strong claims and even sometimes claims that there's no evidence that this works. And it's like, there is. You might not have searched for it, you might not be satisfied, you might not think it was methodologically sound, all of that, but there is. And I think in some cases, yes, some people might look at what the person said and then see my comments and say, no, but this person has proven it's worth and you haven't, you're a guy who built a product. At the same time, I think there's other people who look at my willingness to defend the product and try to share evidence of how it works and what are the mechanisms and all of that in a good light. And in that particular thread, I was completely dismissed. And I was a little bit sad because I just wanted to share what we have. Yeah, well, the debate will obviously continue, but I think from my perspective, again, I'm not particularly thinking of anyone in particular at the moment, but it should always be about playing the ball, not the player. If people want to argue against your product, then they can do that. And if they can come to you with different evidence that says that it's all complete nonsense or whatever, or if you can come up with some gold thing that says that all this is all perfect and brilliant and validated, et cetera, then have the debate on that level. But yeah, I definitely don't think people should be kicking each other in the privates just for the sake of a few clicks or just to make a point. But yeah, even I've been dragged and insulted by user research in the past, and I'm actually pretty sympathetic to the trade and an advocate for it. So I guess there's always someone that's going to pick on someone. But what's next then for synthetic users? Obviously, the company's still around. You're three years into the product, as you say. You've presumably got stuff coming up in the future and new capabilities that you're trying to kind of move forward with. So what's the plan? Not just staying the same, I imagine. So what's on your mind? So we just launched some new stuff, or Iris, or Agents of Capabilities agent that helps you plan and have a really good and deep understanding of the entire research question that you're dealing with. New modalities. So Vision, we launched last year. We have some new stuff now. Figma might be around the corner as a way for you to test your prototypes with synthetic users. Video is also coming down the line. We want you to be able to, if you haven't, if you're going to spend 15 million on a Super Bowl ad, I hope you test it with humans. But if you have five ads and you don't want to test them all with humans, maybe you figure out the three best with synthetic and then you test the other three with humans. So I think it's really about the goal for synthetic users is essentially helping people make better decisions regarding their business. Until now, we've been a little bit more on the discovery side. If we think about this in the design thinking framework, we've been more on the discovery side. as we've been more on the discovery side, but we want to move also more to the to the second diamond. We want to help you brainstorm better ideas, better functionality. We want to go beyond just getting you feedback on stuff that you've thought about and helping you also be more creative and expand all your thinking about what is possible to help you explore best your opportunity tree. Sounds like there's some exciting stuff coming up. Well, if people wanted to come and find you after this and chat about some of that or have a spirited debate about the pros and cons of synthetic users, where can they come and find you? I hate it so much. You can't imagine how much I hate LinkedIn. But it's this, it's the place. I'm right there with you, by the way. I despise it. I've kind of got Stockholm syndrome with LinkedIn these days. Exactly, exactly. It's like you need to be there because professions and all of that, but it's the most toxic place where people say they do stuff that they really don't, that they understand something that they never thought about. Now it's just a slobber fest of LLM generated content with 12 lessons I learned from my mom dying about B2B sales. I just hate it. But that's where people can find me. But you're there anyway. I'm there anyway. Less and less with time, I hope. I'm on Twitter. You can search for me with synthetic users if there's a lot of ugualus there. And I'm also on Twitter. Amazingly, I don't get into fights on Twitter. I don't comment a lot on Twitter. I comment on stuff, but it's more on the technical side of LLMs and generative AI. U-G-O underscore A-L-V-E-S on Twitter. And that's what I live. I spend most of my day when not doing real work on Twitter. I think Twitter is also work because that's what I've discovered a lot of this time. Well, yeah. I mean, for all of the things that you say about Twitter or X, that's another place where I have a bit of a kind of abusive relationship with and I don't really use it particularly that much anymore to do stuff. But there is still quite a lot of commentary. If you can kind of wade through the kind of the neo-Nazi stuff, then there's still some interesting tech conversations going on there. But that's the thing. I think they've done a good job with the algo because I don't see any neo-Nazi stuff at all. For me, it's LLMs, gen-AI. It's my entire feed. It's just agents and prototypes and GitHub links and research papers. I guess I need to start following some people on there so I don't just get the Elon version. I have Elon blocked. I don't want to see anything from that guy. Well, you know, it's something we can both agree on and presumably all of your synthetic users would as well. But I'll pop that all into the show notes anyway and then hopefully you get a few people heading over to have a chat about LLMs or other related topics. But as always, Hugo, it's been a pleasure to chat. Obviously, really interesting digging some deep and meaningful topics. We'll stay in touch. But as for now, thanks for taking the time. Thank you so much for the opportunity, Jason. I had great fun having this chat with you.