Fixing long running Claude code sessions

https://www.youtube.com/watch?v=XWp4k9K6oK8 Here is a markdown summary of the video transcript, detailing the workflow for effective long-running AI coding agents.

Effective Harnesses for Long-Running Agents (Claude Code Workflow)

This document outlines a solution developed by Anthropic and adapted by the video creator to solve the primary issue with AI coding agents: Context Window Limitations.

🔴 The Problem

When AI agents (like Claude) attempt to “one-shot” large applications or complex features, they encounter two specific failure modes:

Context Compaction: As the task grows, the conversation is compacted to save tokens. The AI loses the specific details of what it previously did, leading to half-implemented features or hallucinations about progress.
False Completion: Without strict testing protocols, agents tend to mark features as “complete” even if they are untested or buggy.

🟢 The Solution: A Two-Agent Workflow

Inspired by real-world engineering teams, this workflow splits the process into an Initializer Agent and a Coding Agent working incrementally.

Phase 1: The Initializer Agent

The first agent session focuses solely on setting up the environment and documentation. It does not write application code yet. It creates the following:

**CLAUDE.md**: A project overview file containing architecture, tech stack, and commands.
**init.sh**: A script to initialize the dev server.
**features_list.json**: A JSON file listing every feature and its testing steps.
- Note: JSON is used instead of Markdown because models are less likely to corrupt the structure.
- All features start with "passes": false.
**claude-progress.md**: A file to track what agents have done.
Tooling Setup: Installs tools like Puppeteer so the agent can browse the local host and visually test the app (since it can’t “see” the browser otherwise).
Initial Git Commit: Commits the scaffolding to create a baseline.

Phase 2: The Coding Agent (Incremental Loop)

The second agent picks up where the first left off, but follows a strict loop to manage context:

Read State: The agent reads the Git logs and the progress files to understand the project state (rather than relying on chat history).
Implement Single Feature: It selects the highest priority feature from features_list.json.
Test: It uses Puppeteer to verify the feature works end-to-end.
Update Documentation:
- Updates features_list.json (sets "passes": true).
- Updates claude-progress.md with the completed task.
Git Commit: It commits the changes with a descriptive message.
- Critical: By committing continuously, the “Save State” is in the Git history, not the AI’s context window.
Repeat: The process repeats for the next feature.

🔑 Key Advantages

Resumability: Because the state is stored in Git and files (not the chat), if the session crashes or compacts, a new agent can run git log and cat claude-progress.md to pick up exactly where the previous one stopped.
Testing Compliance: The workflow forces the agent to mark tests as “failed” initially, preventing it from lying about completion without running Puppeteer tests.
Context Efficiency: By focusing on one feature at a time, the context usage remains low (e.g., 84% usage after many features) compared to other methods like BMAD which might hit limits faster.

🆚 Comparison to BMAD Method

While similar to the BMAD (Breakthrough Method for Agile Development) workflow, this specific Claude workflow is:

Integrated: Context utilization is more efficient.
Git-Centric: Relies heavily on Git logs for memory rather than just file summaries.
Simpler: Does not require calling agents separately; it creates a natural loop.

NemoClaw Knowledge Wiki

Explorer

Fixing long running Claude code sessions

Fixing long running Claude code sessions

Effective Harnesses for Long-Running Agents (Claude Code Workflow)

🔴 The Problem

🟢 The Solution: A Two-Agent Workflow

Phase 1: The Initializer Agent

Phase 2: The Coding Agent (Incremental Loop)

🔑 Key Advantages

🆚 Comparison to BMAD Method

Graph View

Table of Contents