← Return to Index Archived April 16, 2026
The Lead — Apr 16
JUST NOW POSSIBLE · TERESA TORRES

Building Todoist Ramble: How Doist Turned Voice Braindumps into Real-Time Task Capture

1h 00m / April 16, 2026 /aiproducttechnology / Transcript sourced from openai
All episodes from Just Now Possible →·Podcast website →·Listen on Apple Podcasts →

Overview

This episode features three Duist team members—Ernesto Garcia, Thomas, and Hugo—discussing how they built Todoist Ramble, an AI-powered “voice-to-tasks” feature that turns spoken brain dumps into structured tasks. Rather than starting with a predetermined voice feature, the team began with a broader AI exploration and landed on Ramble because it solved a real user problem: helping people capture messy, unstructured thoughts before they are ready to commit them into a formal task list.

The conversation goes beyond the product demo and offers a candid look at how AI features actually get built: prompt design, tool constraints, real-time UX tradeoffs, multilingual evaluation, and the limits of current models.

Key Takeaways

A core insight was that task capture is not just about speed; it is also about psychology. Duist’s research showed that many users hesitate to put something directly into Todoist because doing so feels like a commitment. Instead, they often start with pen and paper, or even ChatGPT voice, to think through what they need to do. Ramble addresses that “cold start” moment by letting users speak freely and allowing AI to shape those thoughts into actionable tasks.

Another important takeaway is that the team deliberately constrained the AI. The model does not chat back or generate open-ended text. It only issues a small set of tool calls—add, edit, or delete task—which helps keep the system reliable and aligned with Todoist’s product boundaries. This is a strong example of using AI narrowly and purposefully rather than letting it behave like a general assistant.

The episode also highlights a counterintuitive technical lesson: processing raw audio directly worked better than first converting speech to text. The team found that transcription added latency and lost information such as pauses and intonation, both of which can matter for meaning. By using a live audio model that could process speech and trigger tool calls in real time, they created an experience where tasks appear while the user is still speaking.

Finally, the discussion underscores how much AI product quality depends on evaluation. Duist built internal evals using employee-recorded samples in more than 20 languages, with varied accents and recording conditions, to catch regressions and improve prompts. They also learned to accept some limitations, especially in lower-support languages, rather than over-engineering around model weaknesses that may improve over time.

Practical Steps

If you are building AI features, start with a real user behavior rather than a flashy model capability. Duist succeeded because they identified an existing workaround—brain-dumping into paper or another tool—and designed Ramble to fit that behavior.

When implementing AI in products, constrain the model tightly. Define a limited set of actions the model is allowed to take, and make those actions explicit through tool calls. This reduces hallucination risk and makes the feature easier to test.

Invest early in evals. Create a representative dataset of real scenarios, including edge cases, different languages, and varied input quality. Use this set to compare prompt changes and detect regressions before shipping.

For UX, combine modalities thoughtfully. Duist used both visual feedback and audio cues so users could trust the system whether they were looking at the screen or speaking while driving. That kind of confirmation loop is essential in real-time AI interfaces.

Notable Quotes

  • “We didn’t start with a Ramble feature in mind and then decided to use AI. We started with an AI exploration.” — Hugo
  • “Once in Todoist you’ve committed to it.” — Hugo
  • “We don’t want the model to try to be overly smart.” — Thomas

Full Transcript

Source: openai 1h 00m runtime

Welcome to Just Now Possible with Teresa Torres. My name is Ernesto Garcia. I'm a front-end product engineer at Duist. I've been at Duist for a little over seven years now. And lately, my work has been more around how AI shows up in our products and also how our products interface or integrate with external AI systems. And in particular, in Ramble, which is a product that we're going to talk about, I was involved mostly in the early stage of exploration and then integrating into the web application in the UI. My name is Thomas. I'm a back-end software engineer at Duist. I've been working here for about seven years now. Lately, I've been working a lot on security as well as database, schemas, and re-sharding, and all sorts of stuff about database organization. Before that, I was working on Ramble on the back-end side, so making sure, I mean, creating the microservices that make it work, writing prompts, all the testing, and making sure that we have a high-quality back-end service that we can offer to our clients and then to our users. My name's Hugo. I'm one of the product managers at Duist, and I've been at Duist for more than 10 years now. That hurts. It's my first job. And yeah, at the moment, I'm working on U-Bet. But previously, I worked on taking Ramble from a prototype to the high-quality launch product that it is. So we're excited to talk about that today. Excellent. It says a lot about Duist that you all have long tenures there. Someone tell me a little bit about what Duist does as a company. So Duist is a software company, and we're building productivity software. The main product that we have and work on is Todoist. So it's a to-do list app. Project manager is great for taking your tasks as a personal user, but also with your team, and go from capturing the task to working on the task and then completing the task. That's the main product we work on. We also have Twist, which is a team communication app. And it's geared more towards async communication for teams like us that are pretty remote and covering the world. Yeah, and at the moment, also working on a few new bets that we might touch on at the end of the call. Excellent. And then tell me about, is it Ramble? Yeah, so Todoist Ramble is our voice-to-tasks feature. You can just rumble anything that you have in mind, and we're going to capture it as tasks. So yeah, it's a pretty delightful feature that we worked on last year. And it's our, I think it's our very first pure AI feature that we built into the product. I love the name because it takes any pressure off of having to have structured thoughts, especially for to-dos, right? It feels like I might have to know exactly how to say a to-do, but just in the name of the product, it's just Ramble. And is that how you built it? Is the intent, is this just fully unstructured input and your AI figures it out? We didn't start with a Ramble feature in mind and then decided to use AI. We started with an AI exploration, given all the advances that AI has and the new things that are possible with the new AI technologies. So we had this space of exploration to see how we could introduce AI into some features in a way that also made sense and not just for the sake of it. And during that exploration, we came up with a few prototypes of different features. Some of them not related at all to what Ramble does, other kinds of things, but also some other features that started to look more and more similar to what Ramble ended up being at the end. And Ramble came up as one of the few top contenders of features that actually made sense, that were solving a user problem and not just putting AI into product for no other reason. Yeah. Yeah, I love this. Okay, so it started as, let's just explore how we might use AI. Tell me a little bit about, I imagine as a to-do company, this is something you've heard before. Did you have any type of voice interface before? Did you have any feedback from customers that this was something they wanted? Give me a sense for like, how did you know this was a problem worth solving? One of the main USP for Todoist has always been the Quick Can. It's our feature to capture tasks. And we've always made a point of making it super fast. They were one of the first to introduce natural language parsing, maybe like 15 years ago. So you could say stuff like, I need to call my mom tomorrow at 10, and we're just parsing the text and figuring out the attributes of the task. So it's always been something we wanted to improve. And as far as I remember, when joining, we were already talking about how we could make voice happen and the technology was just not really there. So I think once we've seen some models come out around voice, we were very excited about trying them out and see how we could make capturing tasks even faster and more useful and frictionless. So I think that's really what drove us to make it one of the top contenders for using AI in the tool, it was always there. And we also backed it up with a bit of research. We've got a continuous discovery process as well here. And as part of the research, we saw that there was a bit of a cold start problem in Todoist where you don't really know what to do. And even though you, I guess you don't know yet what you need to do, sometimes you need to brainstorm that a little bit. And a lot of people have been sharing that they use pen and paper to figure out the tasks and then they copy them into Todoist or they use the new voice in chat GPT, the new voice feature to brainstorm a little bit what they need to do. And then they dump that into Todoist. So it's an opportunity for us to cover that use case of brain dump and that's what we called it internally at the beginning. And we also have this image of when the devils were worse Prada, Miranda, she has this moment where she's just dumping tasks to her assistant. And so we also had that in mind when building the feature and we even tried it with the feature as we went to see if it was able to capture a lot of tasks at once. I think that's amazing. Taking inspiration from a movie scene is great and actually using it as like your test case is excellent. So I love that you started with, I can imagine the very simple case of this is where you started with just natural language where I click on maybe a microphone icon. I'm literally stating a task the way that I would probably type it in the tool which means I'm thinking about it from the tool's point of view of the title of a task maybe when the task is due. But it sounds like Ramble is different from this. Ramble is I can just brain dump as you said and it's gonna try to figure out what to do with all of that. And I love this visual of that movie scene. So like I'm imagining I could go for a walk and just stream of consciousness. Here's what's in my head that I need to capture. Is that fair? Yeah, I think that's fair, yeah. Okay, and so let's talk a little bit about, I love that you were seeing this idea of people were starting on pen and paper and then transferring it over to Todoist. What did you learn about what's happening in that pen and paper session? I'm trying to just understand this problem of a brain dump. Is it like people just haven't thought about how to structure them as tasks? Tell me a little bit about what you know about that step. Yeah, so I think the behavioral thing behind it is that once in Todoist you've committed to it. So I think people are figuring out a way to be more unstructured than figuring out the actual tasks that they wanna add. So I think for many people, they go straight to quick add, just adding the task. But we've discovered that for some people they need to think through their plan or what they wanna do first and then add the tasks once it's clear. So I think in that case with Rambo, it's trying to bridge to that behavioral thing that we saw in the research and we're also working on other types of cases like pen and paper, like taking an image, converting that into tasks and also just dumping it off text and then figuring out the tasks in it. So we're going even beyond Rambo at the moment. It's exciting because we are able to bridge to what people actually do in real life. Yeah. That's where AI is becoming pretty useful for us. This sentiment that people don't wanna put it into do this cause it's like a commitment that now I have to do it is a really nice insight, right? Like pen and paper doesn't feel real yet but as soon as it goes into the to-do app now I have to commit to doing it. Okay, so it sounds you were very aware that there was this process of let me just brain dump. You saw what was possible with LLMs. This was one of the first ideas that came up that you started prototyping around. How did you evaluate this was a problem that an LLM could be good at? We started, as I said, with an AI exploration phase and we started actually with an idea similar to what ended up being Rambo but using mostly text input. So it was still text input but you can dump there something unstructured just your thoughts and stuff like that. And those early prototypes with that kind of approach did work pretty well in terms of showing us that LLMs were already at the time which was a little over a year ago maybe they were already good enough to take out of that unstructured text, unstructured thoughts a set of structured tasks for the user. So that was one of the first evaluations that we did but there was still the friction that you had to type all of that, which is not the idea. And then the next step was testing whether voice models could also be up to the task and a few tests internally we had initial rough prototypes to try it out before even putting it in our products and things look good. So we started to invest in actually giving it a shape that would be suitable to integrate into the product. Great, and then tell me a little bit about let's get into the, how does it work? I imagine your user is hitting a ramble button and they're rambling for a little while and then what happens? So what happens when a user starts a rumble session is that the browser will open a connection to our backend to our microservice dedicated to that. And it will send the raw audio from the microphone and then we will forward that to our LLM provider. So this is Google Vertex, the Vertex API. And then we have, we send our prompts to that model and all the tool definitions. And we use a very nice model that has live audio processing. So it can call tools and do all sorts of processing while the user is speaking. We don't have to wait until the user has finished talking to start getting information from the LLM. And thanks to that, we can start displaying tasks to the user as they are talking, even if they are not finished talking and then they can iterate on that. They can correct themselves and we have the right tools for that to add new tasks and to edit the tasks that they are adding so that they can correct if they said something wrong or if the LLM said, understood something wrong. I can just change that in real time and show the updated tasks immediately in our client. So let me make sure I understand, one model is doing both the speech to text and also doing on the fly tool calls while the person continues to talk. So as I ramble, I'm seeing tasks being created that I can then refine as I keep talking. Yes, that is correct. We initially tried a little bit with using a first model to do a transcription of the audio and then processing that but in the end, it wasn't worth it. It adds a lot of latency and the results are not much better because when you're talking, there's more than text. There's also the poses, the intonation that you have. All of that can carry meaning as well. And if you just have a text transcription, you lose all of that. So when you have a model that can process the audio directly you get a richer input in that way and we can have a more accurate capture of what the user is actually meaning that if we just process text. Oh, so the model is, there's not a transcription step. It's actually just processing the audio directly. Yes. Yeah, fascinating. Okay, and is the user, is this available on desktop or mobile both? It's available on both. It's on desktop and mobile applications. We're even started a beta for Android Wear version so you can use it from your smartwatch if you have an Android watch. So that's really cool. Yeah, very cool. Okay, so as I ramble, I'm getting, I'm seeing on the screen tasks start to pop up and this is just midstream. So I can start, I can stop what I'm saying and say, no, I meant this, not that. I can say, retitle this, add this to the notes, whatever. This is fascinating because it feels very visual and the thought that immediately came to mind is I imagine people use this as they're driving and they remember a task. So I imagine there must be like, you must have had to think through like just the right UX of do you support different modalities of I actually wanna see what's on the screen. I have audio only. Has this come up at all in your testing? I'm curious about just what use cases you have for this. Yeah, a lot of people drive and ramble, that's the thing. And to that insight, we created, so we added sound effects. So when you add a task, there's a sound that is gonna confirm that you've added a task and we actually captured it from your ramblings. Then also when you edit a task, when you say, oh, sorry, I meant this and that, or this is not priority one, this is priority four. We have a different sound that is gonna mean it's edited and it can move on to the next thing. So we've added the sound effects to make sure that people drive and look at the road ahead. But yeah, it's very visual. Most people use it whether on desktop or mobile. And so you have the dedicated screen where you see the tasks popping up, coming in and then you can edit them and you would see that happening in real time. And yeah, it's very good to feel you're being heard and you can trust that the tool is actually understanding. Previously, I think at the very beginning, we tried an actual live audio thing where the tool would respond back with voice. There was a little slow. And we also tried with just ramble and then at the end you have all your tasks added but you don't have this confirmation step where you wanna make sure that what you said is what is gonna be added to Todoist. So I think we showed the right balance of UX where it's live but it's also visually confirming to you and not just at the end or through voice. So it feels really nice that you actually feel heard by Todoist. Yeah, I like that you're using both visual and audio cues. It's a nice, I can imagine it feels very alive. Yeah, I thought our designer, Michael, did a great job there. The tasks are actually floating around. So they don't feel as static and confirmed. It's more of a live thing you can play with and edit. And we also added some tips that are gonna cycle through. So you can also learn about what voice commands you can use such as removing a task or editing or adding some attributes like a label or a priority or a description. Yeah, it feels very live. Yeah, very nice. Okay, so I love that there's already just complexity in the UI. Somebody, it sounds like the simplest feature ever. I'm gonna hit record, I'm gonna start rambling but already we've started to uncover like you're turning this into tasks. You're showing them what tasks were created. They can edit those things. They can add metadata to those things. They might be driving. So there's sound cues. Is there anything that like, you said that you experimented with the AI talking back. There were some latency issues. Do you do anything with voice, like text to voice at all? Like at the end of a session or did you decide that wasn't a part of the feature? No, we're not doing any text to speech at all. It doesn't work really well. And it would also be very complicated because we support many languages. It sounds like your audio cues have been enough for people to feel confident that what they're saying is getting captured appropriately. Yeah, I think so. No accidents were reported so far, so I think it's working. The data shows that it's working. Excellent. Okay, so let's get a little bit into the technical details of how this works. So you've got a model that is processing the audio. It can call tools. Tell me a little bit... First of all, I'm curious about your background in this area. So you mentioned this was one of the first AI features at Todoist. Had any of you built AI features before? Were you learning this on the fly? Was it part of this exploration? Tell me a little bit about how did this even come about. This was for me, and I think for everyone involved at the time, among the first few relatively big AI-powered features that we had worked on. But as I said, Ramble came up during an exploration phase, and it was not the first thing that we came up with. So during that time, it was about two to three months that we decided to explore how AI could help us. We built other internal prototypes, most of them simpler features, just to test them out. So in that sense, it was not the exact first thing. And before that, there were still some much smaller features that we had already worked on before in Todoist. We have this amazing, super powerful filtering capability, but it requires a special syntax of how you type the filters. And one of the first things that we did, even before this AI exploration phase, was... And people had a hard time generating the syntax. It was a programmer, someone that was more technical could do it, but someone that is less technical. We had a lot of help center articles with example filters so that they can copy it and use that. And also our customer support team was usually helping people achieve the filtering that they wanted because the syntax was super powerful but complex to come up with. So we came up with an AI model that you give it a learned prompt about how the filtering syntax works, and then users can express what they want to filter with natural language, short sentence, and then the AI model will give them back a filter query in the slightly more complex language that they can then use and save and they have the filter set up. I think that's the most important one. We also had one extension. It was not part of the core product, but people could install an extension where you could break down a task into subtasks. So you have a task, but it's a very broad one, and you just click a button and the AI model will tell you what are potentially possible subtasks to break it down. So those were the... And other people at Todoist were building those features already before we started the AI exploration phase. Okay, I love that you took a few months to just explore what's possible. That sounds like that was probably a really fun time period and also probably helped you level up a little bit of like, just how do we get comfortable with this new technology? Yeah, all right. Okay. So we've got an audio input model processing the input. Thomas, you mentioned it can call tools. Let's talk a little bit about what types of tools does it have access to? How did you think about tool design? Let's get into some of those details. Yeah, sure. So our model has access to a very limited number of tools, actually. It's basically add task, edit task, delete task, and that's pretty much it. And these tools have several parameters every time. So add the task itself, like the task title, the task description, the due date, deadline, priority, label, project ID, and that's basically it. So, yeah, the whole magic is actually in the prompt telling the model how to parse what the user is saying and to do the right tool calls based on that. So especially not trying to do what the user is talking about, not try to achieve those tasks, but just capture them without trying to overinterpret them. So be as faithful as possible when understanding what the user is talking about while also doing some more advanced things like with, for example, I mentioned we have due dates and deadlines. The LLMs are very bad at dealing with dates, so we have to give special instructions about how to deal with dates so that it plays well with the natural language date handling that we've already had for 15 years, as you mentioned. So, yeah, there's a lot of prompting involved to do all of that right, but the tools themselves are pretty limited in scope, I think. And what actually happens with these tools is that the back-end doesn't do anything with those. We just pass them directly to the client, which views them to show the tasks in the UI to the users to play the audio cues as needed so that the client themselves build the whole resulting task and show the UI to the users. And I wanna clarify something. So first of all, you mentioned two things I wanna come back to, which is working with dates and also getting the AI to just be literal about the task. These are things that I've encountered when I've played with kind of building an AI-driven task management system. And so they resonate. But before we get there, I wanna highlight, it sounds like the model only returns tool calls. It's not, there's not a back and forth between the person talking and the model. The person talks and the model calls tools. And those tools are telling your system, create this task, edit this task, delete this task. Is that correct? That is correct. There's no other output from the model. There's no text or no audio. It's just tool calls that happens while the user is talking so that we can show the updated tasks directly without, with very minimal latency in the apps. And what's nice about this is you're really constraining what the model can do. And I imagine this really helps with output quality. Yeah, absolutely. We just want to have a, we don't want the model to try to be overly smart. It just needs to fit within the boundaries that we are setting that match what we do in Todoist, that match how the product works. So we are very restrictive about it and that's also why it works so well, I think. So I imagine you had to teach the model, like here's what a task is. You have to teach the model, like here's how to identify what's relevant and what you can safely ignore. And then I imagine you also have to, like you said, encounter this challenge with dates. Decide like what stuff is associated with a task. I imagine like people don't just talk cleanly about here's a task. They ramble, maybe they come back and add something to a task earlier. Tell me a little bit about just, I can imagine this took a lot of iteration with prompt design, but also just understanding how you're evaluating it as well. And were there other hard challenges that came up in terms of teaching the model how to do this well? So, yeah, it took a lot of iterations. So it was a lot of testing it ourselves in our development environment. And of course, we also built a whole system to evaluate the quality of that, especially because we want that to work well with Todoist with many languages that our users actually use. So the first thing is, we built a system with an LLM judge to evaluate the quality of the tool calls done by the model. So basically, we record the whole session and we can replay it with a different prompt. For example, we record the audio, we replay it with a different prompt, and then we can see the output from the model and we ask another model to evaluate how good it is compared to the transcript that you're manually providing, saying, okay, did it capture everything? Did it get some things wrong? And we use a smarter model to do that, which is also a bit slower and not live, but which is really good for that. And that is really helpful to do many different tests in a very quick time. Then later, we updated that a lot to add support for many languages, as I said. So we decided for four different scenarios. We put them in plain English, and we asked our staff, our Todoisters, to record them in different languages. Basically, you know, we have over a hundred people in more than 35 countries. So we have many people speaking many different languages. And in the end, we have recordings in over 20 languages with different accents, different talking speeds, different recording quality, because this can also influence the LLM result. And we built a whole system to be able to replay that to check the quality for every language separately to catch regressions when we change anything and just, yeah, to see how good the quality of the results are. Yeah, amazing. And it sounds like you're measuring on a few different dimensions. You mentioned, did it capture everything? Did it capture things correctly from like a quality standpoint? Are there other things that your evals are measuring? It's mostly about, yeah, did it capture everything? Did it miss anything? Or is it something that it didn't understand? Then our LLM judge gives a note out of five, and we considered it successful if the note is at least four out of five. Okay. And so let's get into some of the challenges you ran into with just prompt design and trying to teach the LLM. You mentioned dates were a challenge. This resonates with me because this was like the first problem I had to solve, but I'm not sure all my listeners are familiar with this problem. Do you wanna just share a little bit about some of the challenges you had with dates? Of course. So the first one is that, so it's a, the first one and the most easy one actually, is that the model doesn't know the current date. So if we just say, in three days, maybe it will try, That's something we've discovered pretty quickly is that the LLM was getting very creative and in some way it is great because if you say, I need to plan a marathon in Paris next year, it's gonna probably give you a plan of what exactly you need to do, buy shoes, etc. So it's nice. You get a plan, so it's not really expected. It's just you're supposed to just capture new tasks as they are. But in some ways, it was pretty creative. In some others, it could get too creative and so you would get too much away from just capturing a list of tasks. So I think we also needed to, at some point, I remember with Thomas, we played around with the temperature of the model. So at the beginning, it was set to 1 on 1, so it was very creative and then we tried to just lower that to the point where it was almost too literal and we're missing some of the delightful stuff that it would do. For example, in the prompt, we were making sure that the task is always actionable, so we always add a verb even though the person doesn't really give it. So we're making it actionable for the user. But if you're lowering the temperature too much, it's gonna just add the word that you said. So it wasn't as helpful as it was, so we had to play around with that notion of temperature in the LLM model. Does that make sense? Yeah, and if we have listeners that aren't familiar with the temperature setting, it's basically just, like you said, it's like a creativity setting. It's like how much variation in responses is there gonna be is probably the easiest way to think about it. And it is fascinating to see the change in behavior just based on a really simple setting. Okay, let's dig a little bit into your evals because it sounds like you've got a pretty robust system and especially with different languages. So there's a few things I already heard you share. Did they capture all the tasks? Did they capture them well? And I can imagine you could have errors at different levels. Like you have speakers that are speaking different languages, they might have accents. And so is it really all categorized around those two errors or are you trying to understand, is it a translation error? Tell me a little bit about you, it sounds like you're recording interactions, you're scoring them with a judge, you've got a threshold where it's an error or not. What happens to those errors? What do you do? What are you doing as a team when you find errors? So this was mostly used to catch regressions and to see how we could improve the prompt and how, what impact it had on realistic scenarios. But then we also realized when doing that, that actually the biggest problem is that the model has very different quality support for different languages. Some languages like English, French, Spanish are handled very well. If we're starting with Arabic or Bengali, for example, it's way more tricky. So in the end, it's far from perfect. If we look at the numbers, lots of tasks, a lot of tests that never pass. And yeah, we accept that. It's one of the problem. We know about it. We hope that with future version of the model, it will improve. But it's mostly used to have a baseline to make sure that we are not doing things worse when we're doing any change. I see. And I recall you shared, you got your employees to create almost like your dataset for evaluating. So you've got a whole bunch of recordings from different employees, different environments, different languages. That's your eval dataset that when you make prompt changes, you're running against that. It's not that you're recording everything your users are doing and constantly evaling there. Okay. Yeah. Okay. I was gonna ask about like a data policy there and how you're communicating that and I can imagine with To-Do lists that can get very sensitive. So you're using internal data created by employees and you have an evaluation set for evaluating prompt iterations. Is there anything you're doing in production to get feedback from customers? Do you have a way to collect feedback that this is working well for them? When we were actively working on Ramble, so during the last semester, we had this feedback button in the experience in the Ramble UX, so you could access it and it was pretty like open. People would just be able to give us the feedback on how they would rate Ramble, what they found was working well versus not working for them. And just, it was just grunting through looking at all these, all this feedback that we were getting from users and then figuring out, is it a problem we can solve or is it something that we just need to accept from the model and the quality of it? We did a lot of changes, small iterations on the prompt over time, just going through all the feedback we were getting and just trying to guide the Ramble experience to the right place. There was no magic there, it was just looking at the feedback, using a bit of LLM help to get the feedback and just going through the prompt again. And using these evals was really helpful, so we caught a few regressions where we tried to make the prompt smarter in a way and then you were like, oh yeah, it's actually adding subtasks in the description. We don't support subtasks in the Ramble experience, so we had to go back and figure it out, testing again and again. You mentioned you had to decide, is this something we could fix or is it just something we have to accept as a limitation of the model? How did you distinguish between those two things? Probably just breaking the sort of end of the prompt, it breaks again, try and fix it again and breaks again. And then you just decide that probably a shorter context window for the prompt and the LLM model is going to be easier and you actually get better results. So we're also actually trying to reduce the prompt so it's not trying to direct too much. But yeah, I mean, I think it was just trying, actually trying stuff, whenever Thomas was publishing a new prompt update, we were actively working through Ramble, trying different test cases that we had and just seeing what works, what doesn't. So really trying to fix it and then if you go through iterations and it's just not getting better, just recognizing this might be a limitation we have to live with. Yeah, pretty much. Yeah. Okay. And have you seen, you mentioned you built this a year ago. Have you seen model improvements that have helped with your quality? Yeah, I think maybe Ernesto, do you remember the first model we used from Gemini from Google? I think it was 2.0, 2.5. 2.0 or 2.5, or we maybe iterated between both of them while still playing with it internally, but I wouldn't be able to tell right now. We have upgraded. Okay. Yeah, we've upgraded a couple of times. Now it's using the 3.0 Flash model. Yep. And yeah, I remember we did that switch with Thomas and it was both like faster and you would feel like it would understand the tasks better. So I don't think we've quantified the improvement. It was more of a feel for it. Yeah. It was really like an improvement. So it's great that it's not something you had before where you worked on a feature and then it was possible that the feature would just upgrade over time. That's the nice thing with AI is that Ramble is going to get better even though we don't touch the prompt or touch anything of the product, it's gonna probably over time figure out tasks even faster and in a more elegant way. So that's a nice thing to get like auto maintenance. Yeah, there's this like mantra of build for the model that's going to come out in six months from now, which breaks my brain a little bit. Like how do I know what that model will be able to do? But I think it's this idea of you're, the quality will continue. You can assume the quality will improve over time. And so there's these like little things that you could spend a lot of time trying to optimize for, but anybody who's done prompt optimization in the context of a product, it's a little bit of one step forward, two steps back, two steps forward, one step back. And maybe it's just recognizing that like all these little edge case things don't necessarily need to be solved because the model will get better with time. And then Hugo, I love what you just, how you just framed this. Like, it's pretty cool that the brain of our features get better without us having to invest time and energy into it. Okay. Is there anything about Ramble that we haven't covered that we should have? There's something that comes to mind that I was thinking about when you were talking and Thomas as well about the difficulties. And there's this one about when we started to introduce, at first it was only capturing tasks, but then we needed to make the model aware of the surroundings, the projects. Users can mention, they're adding the task and they can say, at first it wasn't possible, but users wanted to be able to say, put this task in this project, put this task in this other project. And they refer to their projects in loose ways, but the model would need to be made aware of what are the projects of the user to see which one matches with how the user referred to the project. They could just use a single word out of the entire project name. The same applies to labels. So this is in a way similar to the issue with dates, but a different thing because date is something that the model already knows about, but the model doesn't from its training, but the model doesn't know about the particular situation in the Todoist user account. So we need to make the model aware of that and that introduces some complexity around, is it picking the correct project? It could be ambiguous project names. And there may be other stuff aside from projects. Like I said, labels Allow users to create, automate some tasks or create some automations that connect to other systems, that connect Todoist to other systems and do things automatically for you. Yeah, I feel like there's a fun jump from manage my tasks to actually start doing some of my tasks, which task managers are uniquely situated to help with. You know, something you said about a giant text blob to tasks made me think about, I'm seeing more and more, like meeting transcription software is trying to identify tasks and go straight from like meeting to task list. Is that something you guys are experimenting with? Yeah, it's something we've been thinking about. I think the first step would be integrating with these tools. I think Ernesto, if I'm not wrong, I think Renona, for example, is in that list of tools that you can automate. And so we could take that blob of text from the meeting notes also from Google Meet notes and turn them into tasks for the user. So I think there's more to it in general in terms of capturing the tasks from all other work tools, whether it's in Slack or in Microsoft Teams or meeting notes from Renona. There's a lot we can do to make sure that the stuff you agreed to do is actually on your list so you don't forget. So I'm excited to see what we can do there. But funny, I rolled my own task management tool because I feel like task management is so idiosyncratic. I want it to like just match the way my brain works. But all this talk about where tasks come from and all these integrations, you got me thinking, I'm like, oh, I don't want to do all that myself. So maybe I do need to find a task management tool that's doing all these integrations. So that's still allows me to create the idiosyncratic workflows. So there's a to-doist CLI if you want to use that. Dominic Jost has tried to get me to play with it quite a bit. Yeah, it's such a fun, I actually really love this space of just task management because it's almost like a human computer interface, right? We got to get this stuff out of our head on paper, in the computer, on our phone, whatever it is, so that we then know what to do. And there's both the get it out of your head, but then there's also consume it and process it and do stuff with it. And I feel like it's so, every human is so specific in the way they wanna do those things. I feel like it's a great, like fascinating problem, like product problem of how do we do this that supports the way a wide variety of humans think? And it sounds like you guys have spent a lot of time on the capture part, which I think is one of the biggest, most important parts. So I'll definitely go check it out. Yeah, yeah, and the next stage there, there are many, once you have a lot of tasks and the source of truth is in Todoist, now you have a thousand tasks. Do you need to help plan them, to triage them, to figure out the priority of them, to actually work on them, execute them? And so yeah, there's infinite possibilities, which is pretty cool for us as builders is figuring out how we, how people commit to the right stuff and do it because obviously just capturing tasks is not, it's not the end of the world. You know, actually doing stuff or just capturing. So it's the first step and we recognize that the next thing is really being able to plan them, understand what you need to do at the right time. And also maybe sometimes pruning some tasks that you just don't need to do. So I think eventually we want to be able to understand that context and use it to really just help people do the right thing at the right time for themselves. Yeah, it's funny how it's easy to look at this space and think it's really simple, but as you all mentioned in your intros, you've been working at Todoist for years and there's always new problems to solve. It's an evergreen space of just how do we do our work better, which is really nice. All right, I really appreciate you taking the time to share your story with me. Task management is something that I have always nerded out on and it's fun to see a company being so thoughtful about it. So thank you. If you enjoyed this conversation, please subscribe in your favorite podcast app and give us a rating as it helps others find the show. Thanks, I appreciate it.