Create Ai video with your face and any text
https://www.youtube.com/watch?v=_5LuBKXSV5k The video provides a comprehensive guide on how to leverage various AI tools to create realistic talking videos of oneself, suitable for personal projects or professional advertising. The speaker emphasizes achieving character consistency and custom voice narration across different scenes. Here’s a breakdown of the key steps and tools demonstrated:
-
Introduction to AI Video Creation: The video begins by highlighting the potential of AI to create realistic videos where you can be the main character or star in AI advertisements, offering a premium service.
-
Google V03 for Talking Images: The core functionality demonstrated is Google V03’s new “Frames to Video” feature, which allows users to upload images and then generate videos where the characters in the images speak a provided script with lip-syncing. Previously, V03 could convert images to videos, but the added lip-syncing and consistent character generation is a new and powerful update.
-
Creating Custom Imagery of Yourself: Higgsfield AI: This tool is presented as a way to generate consistent AI characters from your own photos. You upload 20+ high-quality images of yourself from different angles and expressions. The tool analyzes these to learn your unique look and create an “AI character” (e.g., “Curly Sunlit Wanderer”). It’s noted that clothing in the reference photos will influence the generated images, so dressing appropriately for desired output is important. Higgsfield also provides a quality score for your uploaded images. Midjourney: While more challenging for perfect character consistency, Midjourney can also be used. The “Omni Reference” feature allows you to upload a reference photo of yourself. Counter-intuitively, setting the “Omni Strength” slider to a lower value (e.g., 100-150) can yield more accurate results for human likeness compared to higher values which can become abstract. Detailed text prompts are crucial here. Freepik (Flux Kontext): This platform offers “Flux Kontext Max” for generating images, particularly useful for integrating products. You can upload two reference images: one of yourself and one of the product. By providing a text prompt describing the scene, the AI attempts to place the product in your hand. This is an experimental feature and may require prompt iteration. The output images might be lower resolution, necessitating upscaling.
-
Upscaling Images: Magnific AI: This tool is recommended for upscaling lower-resolution AI-generated images to higher quality. It can also perform “style transfer” and enhance details.
-
Accessing Google Flow (V03 Feature): To access Google Flow’s advanced “Frames to Video” feature with audio, you currently need two things: An American email account (created while physically in the US, or purchased from specific online services for a small fee). A VPN directed to America, as the feature is currently geo-restricted to the US. The speaker assures that it will likely be rolled out worldwide soon.
-
Narrating with a Custom Voice (ElevenLabs): ElevenLabs Voice Changer: This powerful AI tool allows you to convert an audio clip from one voice to another, or even clone your own voice from just 2 minutes of audio data. This enables you to record your script, upload it to ElevenLabs, and then convert it into a different AI voice (or your own cloned voice) with consistent tone and pitch, rather than relying solely on Google V03’s default voice. The speaker notes that while the voice can be changed, the accent of the original audio might influence the output even after conversion, suggesting prompting for a specific accent.
-
Putting it All Together (Workflow): The overall workflow involves: Creating high-quality character images of yourself using Higgsfield or Midjourney. (Optional) Using Freepik to generate product integration shots with your character. (Optional) Upscaling lower-resolution images with Magnific. Crafting detailed video prompts using Large Language Models like ChatGPT or Google Gemini (or using voice-to-text tools like Super Whisper to generate prompts from your spoken instructions). Importing your chosen image into Google Flow’s “Frames to Video” section. Pasting your detailed prompt and selecting video quality settings (Fast vs. Quality, which affects credits used). Generating the video, which includes character movement and default voice narration. Downloading the generated video. Converting the video’s audio track to MP3 (using online converters like cloudconvert.com). Uploading the MP3 to ElevenLabs’ “Voice Changer” to either change the voice to an existing library voice or a cloned voice of yourself. Downloading the new audio and replacing the original audio track in a video editor.
The speaker concludes by inviting viewers to join their private community for deeper insights into these and other AI tools.