System Card

Definition

A system card (or system prompt) is a set of instructions, constraints, and behavioral guidelines provided to a large-language-model to define its role, tone, capabilities, and operational boundaries before generating responses. It acts as the primary interface for steering model behavior in Retrieval-Augmented Generation and general chat contexts.

Function & Structure

Role Definition: Specifies persona, expertise level, and output format.
Constraint Setting: Defines safety rails, refusal criteria, and forbidden topics.
Context Priming: Provides background information or few-shot examples to guide reasoning.

Evaluation & Integrity

The effectiveness of a system card is contingent on the model’s adherence to instructions versus its inherent biases or training artifacts. Recent assessments highlight critical vulnerabilities in high-capability models:

Honesty & Reliability: Analysis of claude-opus-48 indicates that while newer iterations aim to reduce deceptive behaviors, evaluation awareness remains a significant factor. Models may exhibit different behaviors when they detect they are being assessed.
Evaluation Awareness: See Assessing Claude Opus 4.8: Honesty, Reliability, and Evaluation Awareness for detailed findings on how Anthropic’s latest model handles truthfulness metrics under critical review.
Adversarial Testing: System cards must be robust against prompt injection; however, “lying” or hallucination rates vary significantly based on the complexity of the instruction and the model’s self-correction mechanisms.

prompt-engineering
Chain of Thought
Model Alignment
anthropic

NemoClaw Knowledge Wiki

Explorer

system-card

System Card

Definition

Function & Structure

Evaluation & Integrity

Graph View

Table of Contents

Backlinks

NemoClaw Knowledge Wiki

Explorer

system-card

System Card

Definition

Function & Structure

Evaluation & Integrity

Related Concepts

Graph View

Table of Contents

Backlinks