AutoResearch: Autonomous AI Agent Self-Improvement Through Code Iteration

Clip title: The only AutoResearch tutorial you’ll ever need Author / channel: David Ondrej URL: https://www.youtube.com/watch?v=uBWuKh1nZ2Y

Summary

This video provides a clear explanation of AutoResearch, an open-source tool developed by Andrej Karpathy, a renowned AI researcher and co-founder of OpenAI. The central idea behind AutoResearch is to enable AI agents to autonomously improve themselves. It operates on a continuous loop where an AI agent formulates a hypothesis, modifies its code based on this hypothesis, trains or runs the updated code for a fixed time (typically 5 minutes per experiment), evaluates the results, and then decides whether to “keep” the changes (committing them to version control) or “discard” them (reverting to the previous state). This automated, iterative process aims to accelerate research and optimization significantly.

A key aspect of AutoResearch’s methodology is the “fixed time budget” for each experiment. By limiting every experiment to, for instance, 5 minutes, all results become directly comparable, regardless of what changes the agent made. This prevents the AI from simply “cheating” by training longer and ensures that only genuinely better ideas or modifications lead to improvement. The system is structured around three critical files: program.md (where a human defines the overall goal, constraints, and rules for the agent), train.py (the only file the AI agent is allowed to modify), and prepare.py (which contains the objective metric and evaluation logic, and cannot be touched by the AI to prevent it from manipulating the success criteria).

The video emphasizes that the implications of AutoResearch extend far beyond just training AI models. This recursive self-improvement loop can be applied to nearly any domain where a clear, objective outcome can be measured. Practical use cases demonstrated include optimizing website loading speeds, backtesting and refining trading strategies using objective metrics like the Sharpe ratio, automating marketing A/B tests (e.g., for emails, ad creatives, headlines), and enhancing software development processes by improving code performance or fine-tuning open-source AI models for local deployment. The fundamental principle is: “If you can score it, you can autoresearch it.”

However, AutoResearch is not a universal solution. It fails in scenarios where “better” is subjective (e.g., brand design, complex UX), or if the evaluation loop is too slow or requires human intervention, negating the “auto” aspect. The core skill in this new paradigm shifts from merely executing tasks to “knowing what to measure”—that is, picking the right objective metric and setting appropriate constraints. Karpathy envisions a future akin to the SETI@home project, but for AI research, where thousands of AI agents are distributed across numerous machines, working autonomously to advance various fields simultaneously, effectively emulating an entire research community. The video concludes with a live demonstration of setting up an AutoResearch loop to optimize a website’s loading speed, quickly achieving significant performance improvements.