Optimizing LLM Agent Token Usage with MCP and Code Execution

Generated: 2026-04-29 · API: Gemini 2.5 Flash · Modes: Summary

Optimizing LLM Agent Token Usage with MCP and Code Execution

Clip title: Save 98% on AI Agent Tokens With This One Trick Author / channel: Prompt Engineering URL: https://www.youtube.com/watch?v=rU6IYiQ1SdQ

Summary

The video addresses the significant challenge of excessive token usage in Large Language Model (LLM) agents, particularly those interacting with tools via the Multi-Connector Protocol (MCP). The core problem is that a substantial portion of an agent’s context window can be filled by tool definitions before any meaningful interaction even begins, leading to higher costs, slower performance, and reduced accuracy. To combat this, the video presents 10 techniques, ranging from simple configuration adjustments to advanced architectural patterns, aiming to achieve token reductions of up to 98%.

Among the most impactful advanced techniques discussed is Code Execution with MCP, championed by Anthropic and Cloudflare (as “Code Mode”). This approach treats the MCP server as a file system, allowing the agent to read and load only the specific tools it needs for a given task within a sandbox environment. This progressive disclosure mechanism can dramatically reduce token usage—for instance, an example showed a 98% reduction from 150,000 to 2,000 tokens for a document transfer task. It also offers secondary benefits like filtering large datasets in code, executing loops and conditionals without round trips to the model, and enhancing data privacy by keeping sensitive intermediate results out of the context window. Closely related is Programmatic Tool Calling, where the LLM writes code to call tools as Python functions, further ensuring only final outputs enter the context, thereby unlocking significant performance gains in complex agentic search benchmarks.

Other effective strategies focus on optimizing tool discovery and managing tool sets. The Tool Search Tool allows agents to dynamically search a catalog of thousands of tools, loading only relevant definitions on demand. This can reduce tool definitions in the prompt from tens of thousands to just a handful, yielding over 85% token savings and improving tool selection accuracy. For simpler configurations, Tool Groups enable developers to categorize tools by function (e.g., e-commerce, finance) and load only the necessary group for a session, directly controlling token cost. An even more granular approach, Surgical Selection, lets developers specify exact tool names to load, ideal for highly specialized production agents. Furthermore, Dynamic Context Loading introduces a tiered disclosure system, providing the agent with progressively more detailed tool information (server descriptions, tool summaries, full schema) only when it commits to needing it, thus keeping the context window lean and relevant.

Finally, the video explores output optimization and architectural design patterns. Techniques like Output Stripping (e.g., removing Markdown, ads, or related searches from web results) ensure that only plain, essential text is passed back to the LLM, preventing the model from processing unnecessary formatting. TOON (Token-Oriented Object Notation) is presented as a specialized format to minimize token count in structured data by declaring field names once and streaming data values, achieving 40-60% reductions over standard JSON for flat tabular data. For very large-scale, multi-team environments, a Layered MCP Design separates the orchestrator LLM from underlying tools via sub-agents for discovery, planning, and execution, allowing the top-level agent’s context to remain pristine. The overarching takeaway is to stack these techniques, combining multiple approaches to compound savings and unlock the full potential of efficient, accurate, and cost-effective LLM agents. Many of the discussed tools, including Bright Data’s Web MCP server, are open-source and MIT-licensed, encouraging broader adoption and experimentation.

Video Description & Links

Description

Thanks to BrightData for sponsoring this video. Checkout their new MCP server here: https://github.com/brightdata/brightdata-mcp

MCP servers can burn through half your context window on tool definitions alone — sometimes 150K tokens before your agent sends a single message. This video walks through 10 techniques (including Anthropic’s new Code Execution and Tool Search) that cut MCP token usage by up to 98%.

https://docs.brightdata.com/ai/mcp-server/tools https://www.anthropic.com/news/model-context-protocol https://www.anthropic.com/engineering/code-execution-with-mcp https://modelcontextprotocol.io/specification/2025-11-25 https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool

My voice to text App: whryte.com Website: https://engineerprompt.ai/ RAG Beyond Basics Course: https://prompt-s-site.thinkific.com/courses/rag Signup for Newsletter, localgpt: https://tally.so/r/3y9bb0

Let’s Connect: 🦾 Discord: https://discord.com/invite/t4eYQRUcXB ☕ Buy me a Coffee: https://ko-fi.com/promptengineering |🔴 Patreon: https://www.patreon.com/PromptEngineering 💼Consulting: https://calendly.com/engineerprompt/consulting-call 📧 Business Contact: engineerprompt@gmail.com Become Member: http://tinyurl.com/y5h28s6h

💻 Pre-configured localGPT VM: https://bit.ly/localGPT (use Code: PromptEngineering for 50% off).

Signup for Newsletter, localgpt: https://tally.so/r/3y9bb0

URLs

LLM Agents — Wikipedia
Token Usage Optimization — Wikipedia
Multi-Connector Protocol (MCP) — Wikipedia
Tool Definitions — Wikipedia
Context Window Management — Wikipedia
Code Execution — Wikipedia
Progressive Disclosure — Wikipedia
Programmatic Tool Calling — Wikipedia
Tool Discovery — Wikipedia
Tool Groups — Wikipedia
Surgical Selection — Wikipedia
Dynamic Context Loading — Wikipedia
Output Stripping — Wikipedia
Token-Oriented Object Notation (TOON) — Wikipedia
Layered MCP Design — Wikipedia
Agentic Search — Wikipedia
Sandbox Environments — Wikipedia
Structured Data Optimization — Wikipedia

NemoClaw Knowledge Wiki

Explorer

Optimizing LLM Agent Token Usage with MCP and Code Execution

Optimizing LLM Agent Token Usage with MCP and Code Execution