NemoClaw Knowledge Wiki

❯

❯

one-shot-build

Jul 12, 20261 min read

benchmarking
ai-evaluation
prompt-engineering
model-testing
product-requirements

🗂️ AI & Agents · View mindmap

One-Shot Build

A benchmarking paradigm requiring an AI model to generate a complete, complex output (e.g., full product documentation) from a single, comprehensive input without iterative refinement or follow-up queries.

Key Characteristics

Tests model’s ability to synthesize large, multi-faceted inputs in one pass
Emulates real-world scenarios where prompt engineering is impractical
Measures holistic understanding beyond simple task completion

Related Comparisons

Claude Opus 4.5 and ChatGPT 5.2 were evaluated on a “One-Shot Build” benchmark using a massive Product Requirements Document (PRD) for the Showbiz app
- Input: Comprehensive documentation folder containing technical specs, design tokens, and personality guidelines
- Task: Generate a functional PRD from raw input without iterative feedback
Uses an “Impossible” PRD designed by Matt Maher to push model boundaries

Backlinks

2026 04 14 Compare of Claude Opus 45 vs ChatGPT 52 Matt Maher

Source Notes

2026-04-23: https://www.youtube.com/watch?v=iUzrE3-FHgA Summary of comparison between OpenAI’s GPT-5.2 models and Anthropic’s Claude Opus 4.5 using a complex “One-Shot Build” benchmark.

Graph View

One-Shot Build
Key Characteristics
Related Comparisons
Backlinks
Source Notes

Backlinks

INDEX
application-build
benchmark-testing
legacy-model-comparison
prd
product-requirements-document
AI & Agents
claude-opus-45
GPT-5.2
matt-maher
showbiz

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community