Paper Banana - AI with Surya channel

https://www.youtube.com/watch?v=MtA7CGOXnko Here is a Markdown summary of the video transcript regarding the PaperBanana framework.

PaperBanana: Automated Academic Illustration for AI Scientists

PaperBanana is a new framework developed by a team at Google and Peking University. It is designed to generate publication-quality diagrams and infographics directly from plain text descriptions, overcoming the limitations of standard one-shot image generation models.

🚫 The Problem with Current Models

Current state-of-the-art models (referred to as Nano Banana Pro in the video) operate on a “one-shot” basis:

You give a prompt $r i g h t a rro w$ You get an image.
Issues: If a label is misspelled, a connection is missing, or the color scheme is off, you have to regenerate the entire image and hope for the best.

💡 The PaperBanana Solution: Agentic Workflow

PaperBanana wraps the base image generator in a multi-agent pipeline. It does not just prompt; it plans, executes, and refines.

The 5-Agent Architecture

Retriever Agent: Finds relevant reference examples to guide the system.
Planner Agent: Acts as the cognitive core, translating context into detailed layout descriptions.
Stylist Agent: Enforces aesthetic standards and design guidelines.
Visualizer Agent: Generates the actual image (uses Nano Banana Pro internally).
Critic Agent: Reviews the output, critiques it, and sends it back for revision.

The Loop

The system operates on a Generate $r i g h t a rro w$ Critique $r i g h t a rro w$ Refine loop (typically 3 iterations). It self-corrects specific details like arrow direction, color coding, and text labels without needing manual user intervention.

🧪 Demos & Capabilities

1. Transformer Architecture (Sample Input)

The video demonstrated generating a diagram of the Transformer method.

Result: High fidelity to the actual architecture.
Details: Correctly separated Encoder/Decoder layers into different color palettes, used dashed lines for residual connections, and placed “Pre-LN” annotations correctly.
Process: The system iterated three times, refining the “Sparse Attention Context” arrow and label placement automatically.

2. Google Agent Development Kit (Custom Input)

The presenter fed the system a raw text description of a hypothetical “ADK Agent” system involving orchestration, research agents, BigQuery, and Pandas.

Result: A complex, professional system design diagram.
Details: It correctly mapped relationships between the User, Orchestrator, and Sub-agents. It visually represented specific tools (BigQuery, Pandas) and protocols (A2A) mentioned in the text.

📊 Benchmarks & Performance

The paper evaluated the model on four dimensions: Faithfulness, Conciseness, Readability, and Aesthetics.


Metric	Vanilla Model (One-shot)	PaperBanana (Agentic)	Human Experts
Overall Score	43.2	60.2	N/A
Key Finding	Struggled with specifics.	Beats Humans in Conciseness, Readability, & Aesthetics.	Wins only on Faithfulness (Intent).

🚀 Use Cases

While built for researchers, this technology is applicable to:

Solutions Architects: System design documentation.
Product Managers: Feature flowcharts.
Founders: Pitch deck visuals.
Developers: Pipeline visualization.

⚠️ Important Notes

Code Status: The video utilized an unofficial open-source implementation (hosted on GitHub/Antigravity). The official code from Google/Peking University has not been released yet (expected circa Jan 30, 2026).
Underlying Tech: PaperBanana is a framework/wrapper; it uses existing strong image models (like Nano Banana Pro) to do the actual pixel generation.

Link to research paper: https://arxiv.org/abs/2601.23265

Unofficial GitHub to clone: https://github.com/llmsresearch/paperbanana

NemoClaw Knowledge Wiki

Explorer

Paper Banana - AI with Surya channel

Paper Banana - AI with Surya channel

PaperBanana: Automated Academic Illustration for AI Scientists

🚫 The Problem with Current Models

💡 The PaperBanana Solution: Agentic Workflow

The 5-Agent Architecture

The Loop

🧪 Demos & Capabilities

1. Transformer Architecture (Sample Input)

2. Google Agent Development Kit (Custom Input)

📊 Benchmarks & Performance

🚀 Use Cases

⚠️ Important Notes

Graph View

Table of Contents