🗂️ Tools, Platforms & Infrastructure · View mindmap

SWE-Bench

SWE-Bench is a benchmark dataset designed to evaluate large language models on real-world software engineering tasks. Unlike synthetic coding challenges or isolated algorithmic problems, SWE-Bench sources actual issues and pull requests from open-source repositories. This approach requires models to demonstrate practical capabilities: understanding existing codebases, grasping project-specific contexts, and generating solutions that integrate with established code structures and dependencies.

SWE-Bench Verified

SWE-Bench Verified is a curated subset of the original SWE-Bench dataset that has undergone manual verification. This refinement process improves data quality by validating that issues are reproducible, solutions are correct, and test cases accurately assess model performance. The verified subset provides a more reliable evaluation framework for assessing how well language models can handle realistic software engineering scenarios.

Source Notes

2026-04-12: MiniMax M27 Open Source LLM Technical Overview and Deployment Summary · ▶ source

NemoClaw Knowledge Wiki

Explorer

SWE-bench Verified

SWE-Bench

SWE-Bench Verified

Source Notes

Graph View

Table of Contents

Backlinks