Optimizing LLM Agent Token Usage with MCP and Code Execution
Generated: 2026-04-29 · API: Gemini 2.5 Flash · Modes: Summary
Optimizing LLM Agent Token Usage with MCP and Code Execution
Clip title: Save 98% on AI Agent Tokens With This One Trick Author / channel: Prompt Engineering URL: https://www.youtube.com/watch?v=rU6IYiQ1SdQ
Summary
The video addresses the significant challenge of excessive token usage in Large Language Model (LLM) agents, particularly those interacting with tools via the Multi-Connector Protocol (MCP). The core problem is that a substantial portion of an agent’s context window can be filled by tool definitions before any meaningful interaction even begins, leading to higher costs, slower performance, and reduced accuracy. To combat this, the video presents 10 techniques, ranging from simple configuration adjustments to advanced architectural patterns, aiming to achieve token reductions of up to 98%.
Among the most impactful advanced techniques discussed is Code Execution with MCP, championed by Anthropic and Cloudflare (as “Code Mode”). This approach treats the MCP server as a file system, allowing the agent to read and load only the specific tools it needs for a given task within a sandbox environment. This progressive disclosure mechanism can dramatically reduce token usage—for instance, an example showed a 98% reduction from 150,000 to 2,000 tokens for a document transfer task. It also offers secondary benefits like filtering large datasets in code, executing loops and conditionals without round trips to the model, and enhancing data privacy by keeping sensitive intermediate results out of the context window. Closely related is Programmatic Tool Calling, where the LLM writes code to call tools as Python functions, further ensuring only final outputs enter the context, thereby unlocking significant performance gains in complex agentic search benchmarks.
Other effective strategies focus on optimizing tool discovery and managing tool sets. The Tool Search Tool allows agents to dynamically search a catalog of thousands of tools, loading only relevant definitions on demand. This can reduce tool definitions in the prompt from tens of thousands to just a handful, yielding over 85% token savings and improving tool selection accuracy. For simpler configurations, Tool Groups enable developers to categorize tools by function (e.g., e-commerce, finance) and load only the necessary group for a session, directly controlling token cost. An even more granular approach, Surgical Selection, lets developers specify exact tool names to load, ideal for highly specialized production agents. Furthermore, Dynamic Context Loading introduces a tiered disclosure system, providing the agent with progressively more detailed tool information (server descriptions, tool summaries, full schema) only when it commits to needing it, thus keeping the context window lean and relevant.
Finally, the video explores output optimization and architectural design patterns. Techniques like Output Stripping (e.g., removing Markdown, ads, or related searches from web results) ensure that only plain, essential text is passed back to the LLM, preventing the model from processing unnecessary formatting. TOON (Token-Oriented Object Notation) is presented as a specialized format to minimize token count in structured data by declaring field names once and streaming data values, achieving 40-60% reductions over standard JSON for flat tabular data. For very large-scale, multi-team environments, a Layered MCP Design separates the orchestrator LLM from underlying tools via sub-agents for discovery, planning, and execution, allowing the top-level agent’s context to remain pristine. The overarching takeaway is to stack these techniques, combining multiple approaches to compound savings and unlock the full potential of efficient, accurate, and cost-effective LLM agents. Many of the discussed tools, including Bright Data’s Web MCP server, are open-source and MIT-licensed, encouraging broader adoption and experimentation.
Video Description & Links
Description
Thanks to BrightData for sponsoring this video. Checkout their new MCP server here: https://github.com/brightdata/brightdata-mcp
MCP servers can burn through half your context window on tool definitions alone — sometimes 150K tokens before your agent sends a single message. This video walks through 10 techniques (including Anthropic’s new Code Execution and Tool Search) that cut MCP token usage by up to 98%.
https://docs.brightdata.com/ai/mcp-server/tools https://www.anthropic.com/news/model-context-protocol https://www.anthropic.com/engineering/code-execution-with-mcp https://modelcontextprotocol.io/specification/2025-11-25 https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool
My voice to text App: whryte.com Website: https://engineerprompt.ai/ RAG Beyond Basics Course: https://prompt-s-site.thinkific.com/courses/rag Signup for Newsletter, localgpt: https://tally.so/r/3y9bb0
Let’s Connect: 🦾 Discord: https://discord.com/invite/t4eYQRUcXB ☕ Buy me a Coffee: https://ko-fi.com/promptengineering |🔴 Patreon: https://www.patreon.com/PromptEngineering 💼Consulting: https://calendly.com/engineerprompt/consulting-call 📧 Business Contact: engineerprompt@gmail.com Become Member: http://tinyurl.com/y5h28s6h
💻 Pre-configured localGPT VM: https://bit.ly/localGPT (use Code: PromptEngineering for 50% off).
Signup for Newsletter, localgpt: https://tally.so/r/3y9bb0
Tags
MCP, Model Context Protocol, MCP server, MCP optimization, MCP tokens, Claude MCP, Anthropic MCP, code execution MCP, Anthropic code execution, code mode, MCP tool search, tool calling, TOON, Token Oriented Object Notation, Claude Agent SDK, Anthropic Agent SDK, Claude Skills, Agent Skills, Skill.md, Bright Data MCP, Bright Data, token optimization, context window, Anthropic, Claude, Claude Code, AI agents, prompt engineering, MCP tutorial, scoped tools
URLs
- https://github.com/brightdata/brightdata-mcp
- https://docs.brightdata.com/ai/mcp-server/tools
- https://www.anthropic.com/news/model-context-protocol
- https://www.anthropic.com/engineering/code-execution-with-mcp
- https://modelcontextprotocol.io/specification/2025-11-25
- https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation
- https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool
- https://engineerprompt.ai/
- https://prompt-s-site.thinkific.com/courses/rag
- https://tally.so/r/3y9bb0
- https://discord.com/invite/t4eYQRUcXB
- https://ko-fi.com/promptengineering
- https://www.patreon.com/PromptEngineering
- https://calendly.com/engineerprompt/consulting-call
- http://tinyurl.com/y5h28s6h
- https://bit.ly/localGPT
Related Concepts
- LLM Agents — Wikipedia
- Token Usage Optimization — Wikipedia
- Multi-Connector Protocol (MCP) — Wikipedia
- Tool Definitions — Wikipedia
- Context Window Management — Wikipedia
- Code Execution — Wikipedia
- Progressive Disclosure — Wikipedia
- Programmatic Tool Calling — Wikipedia
- Tool Discovery — Wikipedia
- Tool Groups — Wikipedia
- Surgical Selection — Wikipedia
- Dynamic Context Loading — Wikipedia
- Output Stripping — Wikipedia
- Token-Oriented Object Notation (TOON) — Wikipedia
- Layered MCP Design — Wikipedia
- Agentic Search — Wikipedia
- Sandbox Environments — Wikipedia
- Structured Data Optimization — Wikipedia
Related Entities
- Prompt Engineering — Wikipedia
- Anthropic — Wikipedia
- Cloudflare — Wikipedia
- Gemini 2.5 Flash — Wikipedia