Gemini and DALL-E 3 Workflow: Consistent AI Image Generation Using JSON
Generated: 2026-04-26 · API: Gemini 2.5 Flash · Modes: Summary
Gemini and DALL-E 3 Workflow: Consistent AI Image Generation Using JSON
Clip title: I Tested ChatGPT’s New Image 2.0 and Accidentally Stumbled Upon an Awesome Workflow Author / channel: Craig Does AI URL: https://www.youtube.com/watch?v=qXUww5tnLHs
Summary
The video details a powerful workflow for generating consistent AI images and storyboards using a custom GPT (“JSON Image Creator V.3”) within Google Gemini, subsequently leveraging ChatGPT Plus (powered by DALL-E 3) for advanced image creation and editing. The presenter introduces his custom Gemini Gem, explaining how it meticulously translates natural language prompts into structured JSON code. This JSON format, he argues, significantly reduces the AI’s “guessing” during image generation, leading to more precise and higher-quality results by defining intricate details like camera specifications, lighting, mood, and compositional elements.
The demonstration showcases this process through several examples. First, a prompt for NASA astronauts playing kickball on the moon yields a detailed JSON output. When this JSON is fed into DALL-E 3, it produces a high-fidelity image. The presenter highlights DALL-E 3’s built-in capabilities, such as seamlessly changing the image’s aspect ratio (e.g., from square to 16:9 landscape or 9:16 portrait) while maintaining visual coherence. Furthermore, he illustrates DALL-E 3’s intuitive in-image editing features by using a brush tool to select the kickball and transform it into a yellow eyeball, demonstrating precise object manipulation. A second example, generating an image of a bald eagle catching a fish, further reinforces the editing capabilities.
The core of the video’s “hack” revolves around overcoming DALL-E 3’s known challenge with character consistency across multiple images, crucial for visual storytelling. The presenter reveals a technique: after generating an initial character image in DALL-E 3, the user must activate ChatGPT Plus’s “Thinking” mode (a paid feature). Then, by prompting the AI to create a sequence of images (ideally 5-6) that tell a story, explicitly using the first generated image as a reference point, DALL-E 3 can maintain consistent character appearance and thematic elements across the storyboard. This is exemplified with a humorous sequence of a chimpanzee and a miniature donkey-giraffe playing football, transitioning from a yard to the street, and ending with a scene of consolation, all while preserving the characters’ unique features.
In conclusion, this workflow provides creators with unprecedented control over AI image generation, offering a robust method for developing consistent visual narratives. The combination of structured JSON prompting from Gemini and DALL-E 3’s advanced editing and “Thinking” mode capabilities addresses key limitations of current AI models. The presenter generously offers the custom Gemini Gem, its source files, and a Notion document with detailed instructions, empowering viewers to replicate and experiment with this powerful tool for their own creative projects, ranging from short films and animated shorts to articles and social media content.
Related Concepts
- JSON prompt engineering — Wikipedia
- Consistent AI image generation — Wikipedia
- AI storyboarding — Wikipedia
- Natural language to JSON translation — Wikipedia
- Multi-model AI workflow — Wikipedia
- Structured prompting — Wikipedia
- Character consistency — Wikipedia
- JSON-based prompt engineering — Wikipedia
- Multi-modal AI workflow — Wikipedia
- In-image editing — Wikipedia
- Image-to-image referencing — Wikipedia
- Visual storytelling — Wikipedia
- Prompt-to-JSON translation — Wikipedia
- Aspect ratio manipulation — Wikipedia
- Fine-grained parameter control — Wikipedia
- AI-driven object manipulation — Wikipedia
- Sequence generation — Wikipedia
- ChatGPT Thinking mode — Wikipedia
- Narrative development — Wikipedia