NemoClaw Knowledge Wiki

❯

❯

agentbench

Jul 11, 20261 min read

ai-agents
benchmarking
context-management
eth-zurich
coding-assistance

🗂️ AI & Agents · View mindmap

AGENTBENCH

Benchmark for evaluating AI coding agents, particularly focusing on context management techniques.

Key Findings from ETH Zurich Study (2026)

Recent empirical study “Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?” (ETH Zurich, February 2026) found repository-level context files (AGENTS.md/CLAUDE.md) decrease agent performance.
Challenges common industry practice of using these files to guide agents.
Agents performed worse with these context files compared to no context files.
Study suggests context files may introduce noise or misdirection in agent reasoning.

Implications

Avoid using AGENTS.md/CLAUDE.md in repositories intended for AI agent interaction.
Requires reevaluation of context management strategies in agent development.
Suggests minimal context may outperform structured context files for coding agents.

Reference

Study summary video (Channel Theo, 2026-04-14)

2026 04 14 AI can work worse with Claudemd and agentsmd files Channel Theo

Graph View

AGENTBENCH
Key Findings from ETH Zurich Study (2026)
Implications
Reference

Backlinks

INDEX
multi-turn-agent-performance
AI & Agents

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community