Token pricing
The pricing mechanism used by large-language-models (LLMs) to charge for inference, typically calculated based on the volume of text processed.
Core Components
- Input Tokens: The cost applied to the prompt and any provided context sent to the model.
- Output Tokens: The cost applied to the text generated by the model (usually priced higher than input tokens).
- Model Tiering: Cost variance based on model complexity, ranging from high-reasoning models (e.g., gemini-3-pro) to efficiency-optimized models.
Current Market Benchmarks
- Gemini 3 Flash: A model optimized for speed, efficiency, and low-cost deployment.
- Price: $0.50 per million input tokens.
- Efficiency: High performance-to-cost ratio, scoring 78% on SWE-bench Verified.
Related Concepts
- Inference Costs
- LLM Economics
- context-window
- Tokenization
Backlinks
- 2026 04 14 Mathew Berman Gemini Flash 3 and Nvidia Nematron 3
Source Notes
- 2026-04-14: [[lab-notes/2026-04-14-Optimizing-AI-Costs-and-Privacy-with-Local-Open-Source-Models-and-Hybr|“But OpenClaw is expensive…“]]