Token pricing

The pricing mechanism used by large-language-models (LLMs) to charge for inference, typically calculated based on the volume of text processed.

Core Components

  • Input Tokens: The cost applied to the prompt and any provided context sent to the model.
  • Output Tokens: The cost applied to the text generated by the model (usually priced higher than input tokens).
  • Model Tiering: Cost variance based on model complexity, ranging from high-reasoning models (e.g., gemini-3-pro) to efficiency-optimized models.

Current Market Benchmarks

  • 2026 04 14 Mathew Berman Gemini Flash 3 and Nvidia Nematron 3

Source Notes

  • 2026-04-14: [[lab-notes/2026-04-14-Optimizing-AI-Costs-and-Privacy-with-Local-Open-Source-Models-and-Hybr|“But OpenClaw is expensive…“]]