AGENTBENCH

Benchmark for evaluating AI coding agents, particularly focusing on context management techniques.

Key Findings from ETH Zurich Study (2026)

  • Recent empirical study “Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?” (ETH Zurich, February 2026) found repository-level context files (AGENTS.md/CLAUDE.md) decrease agent performance.
  • Challenges common industry practice of using these files to guide agents.
  • Agents performed worse with these context files compared to no context files.
  • Study suggests context files may introduce noise or misdirection in agent reasoning.

Implications

  • Avoid using AGENTS.md/CLAUDE.md in repositories intended for AI agent interaction.
  • Requires reevaluation of context management strategies in agent development.
  • Suggests minimal context may outperform structured context files for coding agents.

Reference

2026 04 14 AI can work worse with Claudemd and agentsmd files Channel Theo