SWE-bench Lite

SWE-bench Lite is a lightweight benchmark for evaluating AI agents’ software engineering capabilities, focusing on code generation and issue resolution for GitHub pull requests. It serves as a streamlined alternative to the full SWE-bench suite, prioritizing efficient agent evaluation.

Key Insights

  • A 2026 ETH Zurich study (“Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?”) demonstrated that repository-level context files (specifically AGENTS.md and CLAUDE.md) can decrease agent performance by introducing misleading or redundant context.
  • Industry practice of using these files to guide agents contradicts empirical findings, as they may confuse agents with irrelevant repository metadata.
  • This finding directly impacts SWE-bench Lite’s evaluation setup, suggesting context files should be avoided in benchmark environments.

2026 04 14 AI can work worse with Claudemd and agentsmd files Channel Theo

Source Notes

  • 2026-04-23: https://www.youtube.com/watch?v=GcNu6wrLTJc This is a comprehensive summary of the video regarding the effectiveness of AGENTS.md and CLAUDE.md context files, formatted in Markdown. * * * # 📜 Study Summary: Are Repository-Level Context Files Helpful? A recent empirical (📜 Study Summary: Are Repository-Level Context Files Helpful?)