Token pricing
The pricing mechanism used by large-language-models (LLMs) to charge for inference, typically calculated based on the volume of text processed.
Core Components
- Input Tokens: The cost applied to the prompt and any provided context sent to the model.
- Output Tokens: The cost applied to the text generated by the model (usually priced higher than input tokens).
- Model Tiering: Cost variance based on model complexity, ranging from high-reasoning models (e.g., gemini-3-pro) to efficiency-optimized models.
Current Market Benchmarks
- Gemini 3 Flash: A model optimized for speed, efficiency, and low-cost deployment.
- Price: $0.50 per million input tokens.
- Efficiency: High performance-to-cost ratio, scoring 78% on SWE-bench Verified.
Related Concepts
- Inference Costs
- LLM Economics
- context-window
- Tokenization
Backlinks
- 2026 04 14 Mathew Berman Gemini Flash 3 and Nvidia Nematron 3
Source Notes
- 2026-04-14: “But OpenClaw is expensive…”
- 2026-04-07: NemoClaw vs. OpenClaw: NVIDIA
- 2026-04-10: Qwen 36 Plus Open Source AIs Agentic Capabilities and Frontier · ▶ source
- 2026-04-18: Anthropic Claude Opus 47 Agentic Coding Multimodal and Memory Advancem · ▶ source
- 2026-04-24: DeepSeek · ▶ source