Chat Template Bug

A chat template bug occurs when the formatting logic responsible for structuring multi-turn conversation history (system, user, assistant messages) into a single input string for an llm fails to preserve semantic boundaries or token integrity. This often results in hallucinated responses, instruction following failures, or context window corruption.

Common Manifestations

  • Token Leakage: Special tokens (e.g., <bos>, <eos>, <start_of_turn>) are omitted, duplicated, or misplaced, causing the model to misinterpret role transitions.
  • Role Confusion: The model fails to distinguish between system instructions and user queries due to missing delimiters.
  • Truncation Errors: Aggressive tokenization of template strings cuts off critical system prompts in long contexts.

Known Instances

Gemma 4 Agent Mode Failure

Mitigation Strategies

  1. Explicit Template Verification: Always validate the raw string output of the chat template before sending to the model API.
  2. Unit Testing Templates: Create regression tests that check for specific delimiter patterns (<start_of_turn>, etc.) across various turn counts.
  3. Use Canonical Libraries: Rely on officially maintained tokenizers (e.g., Hugging Face transformers, official SDKs) rather than custom string concatenation.