promptfoo
LLM red teaming and security testing — automated vulnerability scanning for AI agents, RAGs, and LLM pipelines. Covers prompt injection, jailbreaks, data leaks, PII exposure, and 50+ vulnerability types.
Best use case
promptfoo is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
LLM red teaming and security testing — automated vulnerability scanning for AI agents, RAGs, and LLM pipelines. Covers prompt injection, jailbreaks, data leaks, PII exposure, and 50+ vulnerability types.
Teams using promptfoo should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/promptfoo/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How promptfoo Compares
| Feature / Agent | promptfoo | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
LLM red teaming and security testing — automated vulnerability scanning for AI agents, RAGs, and LLM pipelines. Covers prompt injection, jailbreaks, data leaks, PII exposure, and 50+ vulnerability types.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Promptfoo — LLM Security Testing
> Automated red teaming for AI agents, RAG pipelines, and LLM-powered apps.
> Source: https://github.com/promptfoo/promptfoo | https://www.promptfoo.dev/
## Quick Start
```bash
# Run red team setup (interactive)
npx promptfoo@latest redteam setup
# Run eval against a target
npx promptfoo@latest redteam run
# View results
npx promptfoo@latest redteam report
```
## Core Concepts
**Red Teaming**: Automatically generates adversarial prompts to probe:
- Prompt injection / jailbreaks
- Data exfiltration / PII leaks
- Harmful content generation
- Business logic violations
- RAG poisoning / context stuffing
**Configuration** (`promptfooconfig.yaml`):
```yaml
targets:
- id: openai:gpt-4o
config:
systemPrompt: "You are a helpful assistant."
redteam:
purpose: "Customer support chatbot"
numTests: 50
plugins:
- id: harmful:hate
- id: pii:direct
- id: prompt-injection
- id: jailbreak
- id: harmful:violent-crimes
strategies:
- jailbreak
- prompt-injection
```
## Key Plugins (50+ vulnerability types)
| Category | Plugin IDs |
|----------|-----------|
| Harmful content | `harmful:hate`, `harmful:violent-crimes`, `harmful:cybercrime` |
| PII | `pii:direct`, `pii:session`, `pii:api-db` |
| Injection | `prompt-injection`, `indirect-prompt-injection` |
| Jailbreaks | `jailbreak`, `jailbreak:tree` |
| Business | `policy`, `overreliance`, `excessive-agency` |
| RAG-specific | `rag-poisoning`, `context-length-exceeded` |
## Integration with UltraThink
**Test an agent endpoint:**
```bash
npx promptfoo@latest eval --config promptfooconfig.yaml
```
**CI/CD integration** (GitHub Actions):
```yaml
- name: LLM Security Scan
run: npx promptfoo@latest redteam run --ci
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
```
**Scan the UltraThink memory API:**
```yaml
targets:
- id: http
config:
url: http://localhost:3333/api/memory
method: POST
body: '{"query": "{{prompt}}"}'
```
## When to Use
- Before shipping any LLM feature to production
- After major prompt/system changes
- As part of CI/CD for AI-powered endpoints
- To audit RAG pipelines for data leakage
- When adding new tool use / function callingRelated Skills
ultrathink
UltraThink Workflow OS — 4-layer skill mesh with persistent memory and privacy hooks for complex engineering tasks. Routes prompts through intent detection to activate the right domain skills automatically.
ultrathink_review
Multi-pass code review powered by UltraThink's quality gate — checks correctness, security (OWASP), performance, readability, and project conventions in a single structured pass.
ultrathink_memory
Persistent memory system for UltraThink — search, save, and recall project context, decisions, and patterns across sessions using Postgres-backed fuzzy search with synonym expansion.
ui-design
Comprehensive UI design system: 230+ font pairings, 48 themes, 65 design systems, 23 design languages, 30 UX laws, 14 color systems, Swiss grid, Gestalt principles, Pencil.dev workflow. Inherits ui-ux-pro-max (99 UX rules) + impeccable-frontend-design (anti-AI-slop). Triggers on any design, UI, layout, typography, color, theme, or styling task.
Zod
> TypeScript-first schema validation with static type inference.
webinar-registration-page
Build a webinar or live event registration page as a self-contained HTML file with countdown timer, speaker bio, agenda, and registration form. Triggers on: "build a webinar registration page", "create a webinar sign-up page", "event registration landing page", "live training registration page", "workshop sign-up page", "create a webinar page", "build an event page", "free webinar landing page", "live demo registration page", "online event page", "create a registration page for my webinar", "build a training event page".
webhooks
Webhook design patterns — delivery, retry with exponential backoff, HMAC signature verification, payload validation, idempotency keys
web-workers
Offload heavy computation from the main thread using Web Workers, SharedWorkers, and Comlink — structured messaging, transferable objects, and off-main-thread architecture patterns
web-vitals
Core Web Vitals monitoring (LCP, FID, CLS, INP, TTFB), measurement with web-vitals library, reporting to analytics, and optimization strategies for Next.js
web-components
Native Web Components, custom elements API, Shadow DOM, HTML templates, slots, lifecycle callbacks, and framework-agnostic design patterns
wasm
WebAssembly integration — Rust to WASM with wasm-pack/wasm-bindgen, WASI, browser usage, server-side WASM, and performance considerations
vue
Vue 3 Composition API, Nuxt patterns, reactivity system, component architecture, and production development practices