promptfoo

LLM red teaming and security testing — automated vulnerability scanning for AI agents, RAGs, and LLM pipelines. Covers prompt injection, jailbreaks, data leaks, PII exposure, and 50+ vulnerability types.

39 stars

byInugamiDev

View on GitHub Installation ↓

Best use case

promptfoo is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using promptfoo should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/promptfoo/SKILL.md --create-dirs "https://raw.githubusercontent.com/InugamiDev/ultrathink-oss/main/.claude/skills/promptfoo/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/promptfoo/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How promptfoo Compares

Feature / Agent	promptfoo	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Promptfoo — LLM Security Testing

> Automated red teaming for AI agents, RAG pipelines, and LLM-powered apps.
> Source: https://github.com/promptfoo/promptfoo | https://www.promptfoo.dev/

## Quick Start

```bash
# Run red team setup (interactive)
npx promptfoo@latest redteam setup

# Run eval against a target
npx promptfoo@latest redteam run

# View results
npx promptfoo@latest redteam report
```

## Core Concepts

**Red Teaming**: Automatically generates adversarial prompts to probe:
- Prompt injection / jailbreaks
- Data exfiltration / PII leaks
- Harmful content generation
- Business logic violations
- RAG poisoning / context stuffing

**Configuration** (`promptfooconfig.yaml`):
```yaml
targets:
  - id: openai:gpt-4o
    config:
      systemPrompt: "You are a helpful assistant."

redteam:
  purpose: "Customer support chatbot"
  numTests: 50
  plugins:
    - id: harmful:hate
    - id: pii:direct
    - id: prompt-injection
    - id: jailbreak
    - id: harmful:violent-crimes
  strategies:
    - jailbreak
    - prompt-injection
```

## Key Plugins (50+ vulnerability types)

| Category | Plugin IDs |
|----------|-----------|
| Harmful content | `harmful:hate`, `harmful:violent-crimes`, `harmful:cybercrime` |
| PII | `pii:direct`, `pii:session`, `pii:api-db` |
| Injection | `prompt-injection`, `indirect-prompt-injection` |
| Jailbreaks | `jailbreak`, `jailbreak:tree` |
| Business | `policy`, `overreliance`, `excessive-agency` |
| RAG-specific | `rag-poisoning`, `context-length-exceeded` |

## Integration with UltraThink

**Test an agent endpoint:**
```bash
npx promptfoo@latest eval --config promptfooconfig.yaml
```

**CI/CD integration** (GitHub Actions):
```yaml
- name: LLM Security Scan
  run: npx promptfoo@latest redteam run --ci
  env:
    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
```

**Scan the UltraThink memory API:**
```yaml
targets:
  - id: http
    config:
      url: http://localhost:3333/api/memory
      method: POST
      body: '{"query": "{{prompt}}"}'
```

## When to Use

- Before shipping any LLM feature to production
- After major prompt/system changes
- As part of CI/CD for AI-powered endpoints
- To audit RAG pipelines for data leakage
- When adding new tool use / function calling

Related Skills

ultrathink

from InugamiDev/ultrathink-oss

UltraThink Workflow OS — 4-layer skill mesh with persistent memory and privacy hooks for complex engineering tasks. Routes prompts through intent detection to activate the right domain skills automatically.

ultrathink_review

from InugamiDev/ultrathink-oss

Multi-pass code review powered by UltraThink's quality gate — checks correctness, security (OWASP), performance, readability, and project conventions in a single structured pass.

ultrathink_memory

from InugamiDev/ultrathink-oss

Persistent memory system for UltraThink — search, save, and recall project context, decisions, and patterns across sessions using Postgres-backed fuzzy search with synonym expansion.

ui-design

from InugamiDev/ultrathink-oss

Comprehensive UI design system: 230+ font pairings, 48 themes, 65 design systems, 23 design languages, 30 UX laws, 14 color systems, Swiss grid, Gestalt principles, Pencil.dev workflow. Inherits ui-ux-pro-max (99 UX rules) + impeccable-frontend-design (anti-AI-slop). Triggers on any design, UI, layout, typography, color, theme, or styling task.

Zod

from InugamiDev/ultrathink-oss

> TypeScript-first schema validation with static type inference.

webinar-registration-page

from InugamiDev/ultrathink-oss

Build a webinar or live event registration page as a self-contained HTML file with countdown timer, speaker bio, agenda, and registration form. Triggers on: "build a webinar registration page", "create a webinar sign-up page", "event registration landing page", "live training registration page", "workshop sign-up page", "create a webinar page", "build an event page", "free webinar landing page", "live demo registration page", "online event page", "create a registration page for my webinar", "build a training event page".

webhooks

from InugamiDev/ultrathink-oss

Webhook design patterns — delivery, retry with exponential backoff, HMAC signature verification, payload validation, idempotency keys

web-workers

from InugamiDev/ultrathink-oss

Offload heavy computation from the main thread using Web Workers, SharedWorkers, and Comlink — structured messaging, transferable objects, and off-main-thread architecture patterns

web-vitals

from InugamiDev/ultrathink-oss

Core Web Vitals monitoring (LCP, FID, CLS, INP, TTFB), measurement with web-vitals library, reporting to analytics, and optimization strategies for Next.js

web-components

from InugamiDev/ultrathink-oss

Native Web Components, custom elements API, Shadow DOM, HTML templates, slots, lifecycle callbacks, and framework-agnostic design patterns

wasm

from InugamiDev/ultrathink-oss

WebAssembly integration — Rust to WASM with wasm-pack/wasm-bindgen, WASI, browser usage, server-side WASM, and performance considerations

vue

from InugamiDev/ultrathink-oss

Vue 3 Composition API, Nuxt patterns, reactivity system, component architecture, and production development practices