Chat Template Bug
A chat template bug occurs when the formatting logic responsible for structuring multi-turn conversation history (system, user, assistant messages) into a single input string for an llm fails to preserve semantic boundaries or token integrity. This often results in hallucinated responses, instruction following failures, or context window corruption.
Common Manifestations
- Token Leakage: Special tokens (e.g.,
<bos>,<eos>,<start_of_turn>) are omitted, duplicated, or misplaced, causing the model to misinterpret role transitions. - Role Confusion: The model fails to distinguish between system instructions and user queries due to missing delimiters.
- Truncation Errors: Aggressive tokenization of template strings cuts off critical system prompts in long contexts.
Known Instances
Gemma 4 Agent Mode Failure
- Source: Gemma 4 Was Broken for Agents - Google Just Fixed It (Fahd Mirza, 2026-06-09)
- Issue: Gemma 4’s default chat template exhibited critical failures in agent workflows, specifically breaking tool-use chaining and multi-step reasoning contexts.
- Resolution: Google deployed a hotfix to correct the template serialization logic, restoring proper state management for agent-based interactions.
Mitigation Strategies
- Explicit Template Verification: Always validate the raw string output of the chat template before sending to the model API.
- Unit Testing Templates: Create regression tests that check for specific delimiter patterns (
<start_of_turn>, etc.) across various turn counts. - Use Canonical Libraries: Rely on officially maintained tokenizers (e.g., Hugging Face
transformers, official SDKs) rather than custom string concatenation.
Related Concepts
- Prompt Formatting
- Context Window Management
- Tokenization
- Agent Loop