AI can work worse with Claude.md and agents.md files. Channel Theo



https://www.youtube.com/watch?v=GcNu6wrLTJc

This is a comprehensive summary of the video regarding the effectiveness of AGENTS.md and CLAUDE.md context files, formatted in Markdown.


📜 Study Summary: Are Repository-Level Context Files Helpful?

A recent empirical study (February 2026) titled “Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?” by researchers at ETH Zurich challenged the common industry practice of using context files to guide AI coding agents.

📊 Key Research Findings

The study tested models like Claude 3.5 Sonnet, GPT-5.2, and Qwen 2.5 across two benchmarks (SWE-bench Lite and AGENTBENCH). The results contradicted popular developer advice:

  • Success Rates: LLM-generated context files (like those created via /init) actually decreased success rates by 0.5–2%.
  • Inference Costs: Providing these files increased token usage and operational costs by over 20%.
  • Human-Written Files: Manually authored files showed only a marginal 4% improvement in success rates, but still resulted in a 19% cost increase.
  • The Redundancy Problem: Agents are already proficient at exploring codebases. Adding a context file often provides redundant information that distracts the model rather than helping it.

🏗️ The LLM Context Hierarchy

To understand why these files often fail, we must look at how instructions are layered when an agent processes a request. The “hierarchy of precedence” is generally as follows:

  1. Provider Instructions: Hardcoded safety and behavioral guardrails set by OpenAI or Anthropic (e.g., “Don’t help make nukes”).
  2. System Prompt: The “Identity” layer (e.g., “You are a world-class coding assistant”).
  3. Developer Prompt (**AGENTS.md** / **CLAUDE.md**): This is where repository-specific rules live.
  4. User Message: Your specific prompt or task.

The Priority Conflict: Instructions higher in the hierarchy often override those below them. However, adding too much “noise” at the Developer Prompt level can lead to the “Pink Elephant Problem”: telling a model not to do something (like “don’t use legacy patterns”) makes the model focus on that pattern more, leading to errors.


🔬 Live Experiment: With vs. Without Context

In a real-world test on a video review project (“Lawn”), the speaker compared two identical tasks using Claude Code:

MetricWithout CLAUDE.mdWith CLAUDE.md (LLM-Generated)
Execution Time1 minute 11 seconds1 minute 29 seconds
Tokens UsedLower~20% Higher
ResultSuccesful and concise.Succesful, but slower and verbose.

Conclusion: The agent performed better when allowed to explore the codebase natively rather than being “steered” by an auto-generated context file.


🛠️ Best Practices for AGENTS.md

If you choose to use these files, follow these “Minimalist” rules:

  • Don’t Auto-Generate: Never use /init to let an LLM write your context file. It will fill it with obvious information the model can find itself (like tech stacks or file structures).
  • Focus on “The Invisible”: Only include information that is not in the code (e.g., “We are intentionally using X instead of Y because of a hardware bug”).
  • The “Band-Aid” Approach: Use the file only to fix consistent mistakes. If the agent keeps forgetting to run type-checks, add that specific instruction.
  • The Three-Step Hack: If an agent fails at Step 2 of a process, tell it to “Perform Step 3.” It will often unblock itself on Step 2 in the process of trying to reach the further goal.

🚀 Sponsor Spotlight: Daytona

The video was supported by Daytona, a platform designed to provide secure, isolated execution environments for AI agents.

  • Secure Sandboxing: Run AI-generated code safely without risking your local infrastructure.
  • Multi-OS Support: Sandboxes available for Linux, Windows, and macOS.
  • High Performance: Create a sandbox from code to execution in sub-90ms.
  • Cost Effective: Approximately 0.016/hour for memory.

💡 Final Takeaway

“Are you really an AI engineer if you haven’t put a ton of time into your AGENTS.md?” Actually, yes. The best AI engineers focus on building better unit tests, type-checks, and clean code architecture rather than massive “rule files” that eventually go out of date and mislead the agent.