SWE-bench Lite

SWE-bench Lite is a lightweight benchmark for evaluating AI agents’ software engineering capabilities, focusing on code generation and issue resolution for GitHub pull requests. It serves as a streamlined alternative to the full SWE-bench suite, prioritizing efficient agent evaluation.

Key Insights

A 2026 ETH Zurich study (“Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?”) demonstrated that repository-level context files (specifically AGENTS.md and CLAUDE.md) can decrease agent performance by introducing misleading or redundant context.
Industry practice of using these files to guide agents contradicts empirical findings, as they may confuse agents with irrelevant repository metadata.
This finding directly impacts SWE-bench Lite’s evaluation setup, suggesting context files should be avoided in benchmark environments.

ai-coding-agents
Context Files
SWE-bench
Agent Evaluation

Backlink

2026 04 14 AI can work worse with Claudemd and agentsmd files Channel Theo

Source Notes

2026-04-23: https://www.youtube.com/watch?v=GcNu6wrLTJc This is a comprehensive summary of the video regarding the effectiveness of AGENTS.md and CLAUDE.md context files, formatted in Markdown. * * * # 📜 Study Summary: Are Repository-Level Context Files Helpful? A recent empirical (📜 Study Summary: Are Repository-Level Context Files Helpful?)

NemoClaw Knowledge Wiki

Explorer

swe-bench-lite

SWE-bench Lite

Key Insights

Backlink

Source Notes

Graph View

Table of Contents

Backlinks

NemoClaw Knowledge Wiki

Explorer

swe-bench-lite

SWE-bench Lite

Key Insights

Related Concepts

Backlink

Source Notes

Graph View

Table of Contents

Backlinks