Gemini Omni: Clone yourself with AI in under 15 minutes

Overview

Claire Ho spends the episode stress-testing Google Flow and Gemini's video tools by trying to build an AI avatar of herself and turn it into a one-minute hype video for her podcast. The point is less "watch this polished final product" and more "can one person, with little video background, get from selfie scan to usable promo video in about 15 minutes?"

The experiment shows both sides of the current tools: they can take on storyboard ideas, visual direction, and scene generation, but they still misfire, ignore character references, and require a fair amount of trial and error.

Key Takeaways

The clearest idea in the episode is that these newer image and video models act less like a single-purpose generator and more like a rough creative partner. Claire uses Flow not just to render clips, but to help shape the concept itself: scene ideas, visual style, camera framing, and the basic sequence of a promo video. For someone who says she is creative but "not video creative," that matters more than the avatar gimmick.

A second takeaway is that multimodal AI can open up work people would not have attempted on their own. Claire says she would not have known how to solo produce a hype video, block scenes, or frame the shots. The tool lowers the barrier enough that she can at least try, which changes who gets to make this kind of content.

The episode also shows how messy the process still is. The avatar creation had failed in an earlier test. The storyboard could not reliably use her avatar as a character reference. She accidentally generated images instead of video on one pass. Even when a clip worked, the output drifted, adding details like blue nail polish. The result is promising, but not dependable.

There is also an interesting point about source material. Claire notices the model picked up posters and books from the background of her avatar photos and carried them into generated scenes. That suggests these systems can pull more contextual detail from capture inputs than users may expect, which can be helpful for consistency but also means your setup choices matter.

Practical Steps

If you want to try this kind of workflow yourself, Claire's process gives a usable template:

Start with a narrow goal. She did not ask for "a great brand video." She asked for a hype video for a specific podcast.
Build a visual brief in plain language. She described the setting as a dark home office with dark green walls, AI books, posters, and a hacker vibe.
Let the tool propose a storyboard first. Getting scene ideas before generating clips saves time and gives you something concrete to edit.
Generate scene by scene instead of trying to make the whole video in one shot. Claire picks individual frames and turns them into clips.
Check your mode settings every time. Her image/video mix-up is a simple mistake that can waste a generation cycle.
Expect to rerun prompts and swap references. If the avatar doesn't carry through, try re-tagging the character or adjusting the prompt.
Use the outputs as draft material, then stitch together the best takes. The value comes from assembling workable pieces, not from expecting one clean generation.

Notable Quotes

"I'm going to be honest, I'm not a hundred percent sure this is going to work." - Claire Ho
"I would have never been able to solo produce a hype video for my podcast." - Claire Ho
"Now I have this AI producer here that can help me with this effort." - Claire Ho

What I really appreciate about these new generative AI models is it unlocks for me an ability to generate, create something that I would have never been able to do before. — From the episode

Full Transcript

Source: openai 20m runtime

Today I am doing a very strange episode where I'm going to create a video avatar of myself and in about 15 minutes get to a full minute long video starring none other than your favorite podcast host, Claire Ho. Let's get to it. This episode is brought to you by Merge. Building an AI product is one thing. The hard part is everything around it, connecting to the tools your team and customers rely on, letting agents take action with the right permissions, and keeping everything reliable and cost efficient once you're in production. Most teams end up piecing that together themselves. So instead of building the products you actually care about, you get pulled into integrations, permissions, routing, and all the infrastructure underneath. Merge is the infrastructure layer for production AI. It connects to thousands of tools, gives agents secure ways to act inside them, and optimizes model routing and spend without you building or owning any of it. OpenAI, Dropbox, and Ramp already use Merge to move fast and build AI right. Visit merge.dev slash howIAAI to start building for free. This episode of How I AI is going to be an adventure because I'm going to be honest, I'm not a hundred percent sure this is going to work. I'm going to return to a product I covered very briefly a couple of weeks ago called Google Flow and the new Gemini Omni video generation model. And I'm going to try really hard to create an AI avatar of myself that we can animate or I guess cinematically create using AI. So this is Google Flow and one of the features of Google Flow and the Omni model is you are supposed to be able to create an avatar of yourself. Now we tried this the day it came out. It did not work, but we're going to give it another college try and see if we can get a full featured avatar of myself that then we can go and build consistent character videos off of. So I'm going to select up here. I'm going to create an avatar. We're going to click get started. I'm going to scan this QR code. I have my phone here. I've done this before, so hopefully it'll be fast. Okay, I'm going to put the mic away just for one second. I'm going to allow access to my camera and we're just going to take some photos. Okay, ready, start. 17, 81, 49, 20, 25, 22. Okay, now it's having me turn my head. So I turned my head that way. Gave me a checkmark. Turn my head the other way. It's giving me a checkmark and it says we're done. Now it said we were done last time we tried this, so we're going to see. It's going to take a couple minutes and then we will come back and see if I can actually use this avatar of myself. Okay, so look at this beauty. There's this fisheye lens version of me that is now an avatar. So I supposedly can use this and let's use it to create a hype video for the How I AI podcast. So I'm going to go in here and say, help me create a storyboard for a hype video for the How I AI podcast. I already have a character named me. We can reference. Help me come up with the few scenes that would make this great. This is a podcast by Claire about the best ways to use AI at work and in life. Okay, so what I love about Flow or what is pitched to me about Flow is that it's not just a video generation tool. It's actually a whole creative suite. And so ideally it's going to be able to help me not only animate or video generate this avatar of myself, it's also going to help me actually brainstorm what this overall video could be. And I'm creative, but I'm not video creative, so I'm excited to see what it looks like. So how do you imagine Claire? Is she in a modern studio or perhaps a bright airy home office? Should it feel high tech and sleek or more grounded in lifestyle focused? And are we going for high energy and fast paced and thoughtful, inspiring? So I'm going to say she is in a dark home office, dark green walls with books about AI and fun posters lighting around. This should be more authentic lifestyle version, but it's high tech and about coding. Have a hacker vibe to it. Okay, a bunch of typos, but we'll see. We'll see what this does. And what I love about these video models and these new tools, again, usually here on How I AI, we talk about coding. We talk about website generation. We talk about PRDs and work product. But what I really appreciate about these new generative AI models in particular, these multimodal ones, image and video, is it unlocks for me an ability to generate, create something that I would have never been able to do before. So I would have never been able to solo produce a hype video for my podcast. I would have a hard time brainstorming it. I wouldn't know how to frame it. I wouldn't know how to block it. But now I have this AI producer here that can help me with this effort. So let's see what the frames are. It's about seven frames. It's going to be an extreme close-up of me typing on a mechanical keyboard. Totally on brand. Then there's going to be a wide shot of the office. Then it's going to reveal me in my ergonomic chair. Spoiler alert, I'm not actually in an ergonomic chair. I'm going to spin around. That's going to be funny. And it's going to give me a digital heads-up display, which is also ridiculous. But let's let it happen. Then it's going to do a very, what I'm presuming to be a very cheesy AI montage, a lifestyle moment, a call to action. I'm going to hit you with the podcast microphone. And then it's going to say, how I AI. If this looks good, I'm going to say, this is great. Generate the storyboard. I already have the character at me. And so I'm going to send that. We're going to see what it comes up with. I've noticed that it has a hard time referencing the me character in some early tests. So let's see what it comes up with. I'm presuming it's going to take a couple of minutes. So we will take a mini break and then come back to see what it looks like. Okay, it looks like it's generating a grid for the storyboard. It can't use the avatar. So I think it's going to do it without the character reference. It'll be really interesting to see what it comes up with. But then as soon as it's ready, I'm going to go ahead and generate at least a couple of these storyboard scenes one by one. And we can see how well it does with my avatar. Oh, I mean, this is delightful. Look at this glowy mechanical keyboard. Look at how I am hacking on three keyboards. I'm going to make a little eyes at you with my fake glasses, my very trendy glasses. There's going to be me dragging and dropping a file that probably says like AI.md. I'm going to smile and I'm going to speak into the podcast. This looks great. So what I think I'm going to do is I'm going to paste in this first frame of the video that the agent came up with. And instead of saying Claire, I'm just going to at mention in this avatar that it gave me so that we can see if it generates this video with me as the character. And so I think I've replaced my name here. I've given details on camera, on lighting, on everything. I press enter. Let's see what it creates with my avatar. I have no idea what we're going to get into. And hopefully it won't be terrifying. Okay, I'm already nervous. What is surprising to me that I didn't actually expect is it does have my posters and my books background here, I guess because they're behind me when I took the photo. It's taking advantage of that. And I'm going to share my audio as well. And we're going to see how this video worked. Okay, I got that wrong. I actually generated images instead of videos. Totally messed up. Did not click the right thing down here in the bottom right. I had image generation instead of video generation. So again, I'm going to paste that walkthrough of the scene here. I'm going to replace my name with the me avatar. It's going to have my fingers flying across that mechanical keyboard. It's going to be so cool. I'm going to go ahead and press send and we're going to see how long it takes to generate a video. Now, something you'll notice about every time you generate videos, it used to work like this in VEO 2, so I'm not... VEO 3 as well. So I'm not surprised they do this as they're generating two versions of it. It's going to take a couple of minutes. The image took a couple seconds. These are probably going to take a couple of minutes. So I will come back and hopefully we will have our first video with Claire's face in it. And while we're waiting, I'm going to queue up one or two other scenes and see if we can get ones going with my actual face in it because some of these had like the back of my head as opposed to my face. And I think we want to see what my face avatar looks like. So we'll pick frame 3 and see if we can get that going as well. Okay, the first video generated. Now we have blue nail polish. I still like it. Okay, let's see. We were told AI would replace us. That is quite spooky. Okay, we were told AI is going to replace us. Let's see if the video with me actually generates a callback to that. So while that's generating, I'm going to go ahead and make all of these. We're going to stitch them together. It's going to be so awesome. So stick with us. We're going to generate a bunch of videos and we're going to stitch it together into one long hype video. This episode is brought to you by Jira Product