https://www.youtube.com/watch?v=4GiqzUHD5AA This video provides a comprehensive overview of Context Engineering for Agents, defining the concept, explaining why it’s crucial for agents, outlining common strategies, and demonstrating how LangGraph supports these approaches. 1. Context Engineering Defined (0:00 - 1:26) Context engineering is defined as “the art and science of filling the context window with just the right information at each step of an agent’s trajectory.” This includes instructions, external knowledge, and tool feedback.
- Origin: The term gained traction from figures like Tobi Lutke (Shopify) and Andrej Karpathy, who preferred it over “prompt engineering” as it better describes the core skill.
- Analogy: Karpathy drew an analogy comparing LLMs to a CPU, and the context window to RAM or working memory with limited capacity. Context engineering is akin to an operating system curating what data fits into RAM at any given time.
2. Types of Context & Challenges for Agents (1:27 - 3:23) Context engineering is an umbrella discipline encompassing:
- Instructions: Prompts, memories, few-shot examples, tool descriptions.
- Knowledge: Facts, memories.
- Tools: Feedback from tool calls (APIs, calculators, etc.).
It’s particularly challenging for agents because:
- Long-running tasks & accumulating feedback: Agent interactions, especially with tool calls, lead to a rapid accumulation of tokens in the context window.
- Increased token usage: This can lead to various “longer context failures” as outlined by Drew Breunig: Context Poisoning: Hallucinations making their way into the context. Context Distraction: The overwhelming context confusing the LLM. Context Confusion: Superfluous information influencing the response. Context Clash: Conflicting information leading to issues. This makes context engineering a critical skill for building robust AI agents.
3. Common Strategies & Examples (3:24 - 14:11) The video groups context engineering strategies into four main categories:
-
Write Context: Saving information outside the context window to help an agent perform a task. Scratchpads: Persist information within a single agent session (e.g., Anthropic’s multi-agent researcher saving its plan to memory). This can be implemented by writing to a file or a runtime state object. Memories: Persist information across multiple agent sessions. Examples include Generative Agents synthesizing memories from past feedback, and features in ChatGPT, Cursor, and Windsurf that auto-generate memories based on user interactions. The intuition is to integrate new context with existing memories and write updated memories back.
-
Select Context: Pulling relevant information into the context window. Scratchpads: Agents can reference previously written information via tool calls or direct state reads. Memories (Long-Term): Different memory types can be selectively pulled: Semantic Memories: Facts (e.g., facts about a user), often managed via RAG (Retrieval-Augmented Generation) using embedding-based similarity search or knowledge graphs. Episodic Memories: Experiences (e.g., few-shot examples, past agent actions). Procedural Memories: Instructions (e.g., system prompts, rules files like
CLAUDE.mdfor style guidelines or tool usage). Tools: Agents struggle with large tool collections. RAG over tool descriptions (embedding tool descriptions and using semantic similarity search) has been shown to significantly improve tool selection. For large codebases, embedding search needs to be combined with AST parsing for meaningful chunking, file search, and re-ranking. -
Compress Context: Retaining only the tokens required to perform a task. Summarization: Condensing long conversations or work sections. Examples include Claude Code’s “auto compact” feature (summarizing conversation history) and Anthropic’s multi-agent researcher summarizing “completed work sections.” This is also applied when passing context between linear sub-agents in a hierarchical setup. Trimming: More selective removal of irrelevant tokens, using heuristics (e.g., keeping only recent messages) or learned approaches like “Provence” for robust context pruning.
-
Isolate Context: Splitting up context to manage different pieces independently. Multi-Agent: Assigning separate context windows and tools to different agents in a team (e.g., OpenAI’s Swarm, Anthropic’s multi-agent research system). This allows parallel computation and expands the total token processing capacity of the system. Environment (Sandbox): Executing code and tools in an isolated sandbox where token-heavy objects (like images or large documents) can reside. Only selected return values, standard output, or variable names are passed back to the LLM, preventing context window bloat (e.g., Hugging Face’s DeepResearch). State: Using a structured state object (e.g., Pydantic models) with different fields. Certain fields (like message history) can always be exposed to the LLM, while other token-heavy information is stored in separate fields and only selectively “fished out” and passed to the LLM when needed.
4. Context Engineering + LangGraph (14:12 - 20:28) LangGraph, a low-level orchestration framework for building agents, is designed to support all these context engineering techniques.
- Prerequisites: Effective context engineering requires tracing (e.g., LangSmith) to track token usage and evaluation to measure the impact of engineering efforts on agent behavior.
- Write Context in LangGraph: Scratchpad: LangGraph’s central state object allows checkpointing agent state across a session. Any node can access and write to this state, effectively serving as a scratchpad. Memory: LangGraph natively supports long-term memory to persist context across many sessions, allowing agents to learn preferences over time (e.g., as shown in DeepLearning.AI’s course on Agentic Memory).
- Select Context in LangGraph: Scratchpad: Retrieve from the state object in any node. Memory: Retrieve from long-term memory in any node. LangGraph enables agentic RAG for knowledge retrieval and includes pre-built tools like
langgraph-bigtoolfor effective tool selection across large collections using embedding-based similarity search. - Compress Context in LangGraph: Summarization & Trimming: LangGraph provides utilities for summarizing and trimming message history. Its low-level nature offers the flexibility to define custom logic within nodes, enabling post-processing steps after tool execution to compress or filter information.
- Isolate Context in LangGraph: Multi-Agent: LangGraph has implementations for supervisor and swarm multi-agent architectures, facilitating the separation of concerns and parallel processing. Environment (Sandbox): LangGraph integrates with sandboxed execution environments like E2B and Pyodide, allowing agents to perform code execution and manage token-heavy outputs without flooding the LLM’s context. State: LangGraph’s state object can be defined with a schema (e.g., Pydantic model) with multiple fields. This allows partitioning context within the state, exposing only relevant fields to the LLM while others remain isolated until specifically accessed.
In summary, context engineering is a dynamic and essential field for building advanced AI agents, and LangGraph provides a flexible framework that natively supports the key strategies of writing, selecting, compressing, and isolating context to overcome token limitations and improve agent performance.
Related Concepts
- Context Window — Wikipedia
- Agent Trajectory — Wikipedia
- Prompt Engineering — Wikipedia
- Tool Feedback — Wikipedia
- Instructional Design — Wikipedia