AI/LLM Vulnerability Discovery Methodology

Overview

A structured approach to identifying, classifying, and exploiting weaknesses in large-language-model systems. Unlike traditional software security, LLM vulnerabilities often reside in prompt-engineering, reasoning logic, or data leakage rather than memory corruption.

Core Methodology Phases

  1. Reconnaissance: Mapping the attack surface, including model capabilities, training data sources, and integration points API Security.
  2. Threat Modeling: Identifying specific risks such as Prompt Injection, jailbreaking, data poisoning, and supply chain vulnerabilities in fine-tuning pipelines.
  3. Discovery & Exploitation:
    • Systematic testing of input boundaries.
    • Adversarial example generation.
    • Integration with LLM Vulnerability Discovery Methodology for advanced detection engineering practices.
  4. Reporting & Mitigation: Documenting findings with reproducibility steps and recommending mitigations like input sanitization, output filtering, and Adversarial Training.

Key Vulnerability Classes

  • Prompt Injection: Direct or indirect manipulation of model instructions.
  • Jailbreaking: Bypassing safety filters through social engineering or complex logic puzzles.
  • Data Leakage: Extracting proprietary training data or PII.
  • Model Inversion: Reconstructing input data from model outputs.

Resources & Case Studies