SWE-bench Lite

SWE-bench Lite is a lightweight benchmark for evaluating AI agentssoftware engineering capabilities, focusing on code generation and issue resolution for GitHub pull requests. It serves as a streamlined alternative to the full SWE-bench suite, prioritizing efficient agent evaluation.

Key Insights

  • A 2026 ETH Zurich study (“Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?”) demonstrated that repository-level context files (specifically AGENTS.md and [[concepts/claude-ai|CLAUDE]].md) can decrease agent performance by introducing misleading or redundant context.
  • Industry practice of using these files to guide agents contradicts empirical findings, as they may confuse agents with irrelevant repository metadata.
  • This finding directly impacts SWE-bench Lite’s evaluation setup, suggesting context files should be avoided in benchmark environments.

2026 04 14 AI can work worse with Claudemd and agentsmd files Channel Theo

Source Notes