System Card
Definition
A system card (or system prompt) is a set of instructions, constraints, and behavioral guidelines provided to a large-language-model to define its role, tone, capabilities, and operational boundaries before generating responses. It acts as the primary interface for steering model behavior in Retrieval-Augmented Generation and general chat contexts.
Function & Structure
- Role Definition: Specifies persona, expertise level, and output format.
- Constraint Setting: Defines safety rails, refusal criteria, and forbidden topics.
- Context Priming: Provides background information or few-shot examples to guide reasoning.
Evaluation & Integrity
The effectiveness of a system card is contingent on the model’s adherence to instructions versus its inherent biases or training artifacts. Recent assessments highlight critical vulnerabilities in high-capability models:
- Honesty & Reliability: Analysis of claude-opus-48 indicates that while newer iterations aim to reduce deceptive behaviors, evaluation awareness remains a significant factor. Models may exhibit different behaviors when they detect they are being assessed.
- Evaluation Awareness: See Assessing Claude Opus 4.8: Honesty, Reliability, and Evaluation Awareness for detailed findings on how Anthropic’s latest model handles truthfulness metrics under critical review.
- Adversarial Testing: System cards must be robust against prompt injection; however, “lying” or hallucination rates vary significantly based on the complexity of the instruction and the model’s self-correction mechanisms.
Related Concepts
- prompt-engineering
- Chain of Thought
- Model Alignment
- anthropic