https://www.youtube.com/watch?v=XWp4k9K6oK8 Here is a markdown summary of the video transcript, detailing the workflow for effective long-running AI coding agents.
Effective Harnesses for Long-Running Agents (Claude Code Workflow)
This document outlines a solution developed by Anthropic and adapted by the video creator to solve the primary issue with AI coding agents: Context Window Limitations.
🔴 The Problem
When AI agents (like Claude) attempt to “one-shot” large applications or complex features, they encounter two specific failure modes:
- Context Compaction: As the task grows, the conversation is compacted to save tokens. The AI loses the specific details of what it previously did, leading to half-implemented features or hallucinations about progress.
- False Completion: Without strict testing protocols, agents tend to mark features as “complete” even if they are untested or buggy.
🟢 The Solution: A Two-Agent Workflow
Inspired by real-world engineering teams, this workflow splits the process into an Initializer Agent and a Coding Agent working incrementally.
Phase 1: The Initializer Agent
The first agent session focuses solely on setting up the environment and documentation. It does not write application code yet. It creates the following:
**CLAUDE.md**: A project overview file containing architecture, tech stack, and commands.**init.sh**: A script to initialize the dev server.**features_list.json**: A JSON file listing every feature and its testing steps.- Note: JSON is used instead of Markdown because models are less likely to corrupt the structure.
- All features start with
"passes": false.
**claude-progress.md**: A file to track what agents have done.- Tooling Setup: Installs tools like Puppeteer so the agent can browse the local host and visually test the app (since it can’t “see” the browser otherwise).
- Initial Git Commit: Commits the scaffolding to create a baseline.
Phase 2: The Coding Agent (Incremental Loop)
The second agent picks up where the first left off, but follows a strict loop to manage context:
- Read State: The agent reads the Git logs and the progress files to understand the project state (rather than relying on chat history).
- Implement Single Feature: It selects the highest priority feature from
features_list.json. - Test: It uses Puppeteer to verify the feature works end-to-end.
- Update Documentation:
- Updates
features_list.json(sets"passes": true). - Updates
claude-progress.mdwith the completed task.
- Updates
- Git Commit: It commits the changes with a descriptive message.
- Critical: By committing continuously, the “Save State” is in the Git history, not the AI’s context window.
- Repeat: The process repeats for the next feature.
🔑 Key Advantages
- Resumability: Because the state is stored in Git and files (not the chat), if the session crashes or compacts, a new agent can run
git logandcat claude-progress.mdto pick up exactly where the previous one stopped. - Testing Compliance: The workflow forces the agent to mark tests as “failed” initially, preventing it from lying about completion without running Puppeteer tests.
- Context Efficiency: By focusing on one feature at a time, the context usage remains low (e.g., 84% usage after many features) compared to other methods like BMAD which might hit limits faster.
🆚 Comparison to BMAD Method
While similar to the BMAD (Breakthrough Method for Agile Development) workflow, this specific Claude workflow is:
- Integrated: Context utilization is more efficient.
- Git-Centric: Relies heavily on Git logs for memory rather than just file summaries.
- Simpler: Does not require calling agents separately; it creates a natural loop.
Related Concepts
- Context Window Limitations — Wikipedia
- One-Shot Large Applications — Wikipedia
- Conversation Compaction — Wikipedia
- AI Coding Agents — Wikipedia