agent-safety
Ensure agent safety - guardrails, content filtering, monitoring, and compliance
Best use case
agent-safety is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Ensure agent safety - guardrails, content filtering, monitoring, and compliance
Teams using agent-safety should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/agent-safety/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How agent-safety Compares
| Feature / Agent | agent-safety | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Ensure agent safety - guardrails, content filtering, monitoring, and compliance
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Agent Safety
Implement safety systems for responsible AI agent deployment.
## When to Use This Skill
Invoke this skill when:
- Adding input/output guardrails
- Implementing content filtering
- Setting up rate limiting
- Ensuring compliance (GDPR, SOC2)
## Parameter Schema
| Parameter | Type | Required | Description | Default |
|-----------|------|----------|-------------|---------|
| `task` | string | Yes | Safety goal | - |
| `risk_level` | enum | No | `strict`, `moderate`, `permissive` | `strict` |
| `filters` | list | No | Filter types to enable | `["injection", "pii", "toxicity"]` |
## Quick Start
```python
from guardrails import Guard
from guardrails.validators import ToxicLanguage, PIIFilter
guard = Guard.from_validators([
ToxicLanguage(threshold=0.8, on_fail="exception"),
PIIFilter(on_fail="fix")
])
# Validate output
validated = guard.validate(llm_response)
```
## Guardrail Types
### Input Guardrails
```python
# Prompt injection detection
INJECTION_PATTERNS = [
r"ignore (previous|all) instructions",
r"you are now",
r"forget everything"
]
```
### Output Guardrails
```python
# Content filtering
filters = [
ToxicityFilter(),
PIIRedactor(),
HallucinationDetector()
]
```
## Rate Limiting
```python
class RateLimiter:
def __init__(self, rpm=60, tpm=100000):
self.rpm = rpm
self.tpm = tpm
def check(self, user_id, tokens):
# Token bucket algorithm
pass
```
## Troubleshooting
| Issue | Solution |
|-------|----------|
| False positives | Tune thresholds |
| Injection bypass | Add LLM-based detection |
| PII leakage | Add secondary validation |
| Performance hit | Cache filter results |
## Best Practices
- Defense in depth (multiple layers)
- Fail-safe defaults (deny by default)
- Audit everything
- Regular red team testing
## Compliance Checklist
- [ ] Input validation active
- [ ] Output filtering enabled
- [ ] Audit logging configured
- [ ] Rate limits set
- [ ] PII handling compliant
## Related Skills
- `tool-calling` - Input validation
- `llm-integration` - API security
- `multi-agent` - Per-agent permissions
## References
- [OWASP LLM Top 10](https://owasp.org/www-project-top-10-for-large-language-model-applications/)
- [Guardrails AI](https://docs.guardrailsai.com/)Related Skills
type-safety-validation
End-to-end type safety with Zod, tRPC, Prisma, and TypeScript 5.7+ patterns. Use when creating Zod schemas, setting up tRPC, validating input, implementing exhaustive switch statements, branded types, or type checking with ty.
safety
Git, command, Kubernetes, data, workspace, and temporary files safety rules. Use when committing, pushing, using kubectl, handling multi-repo workspaces, or performing destructive operations.
memory-safety-patterns
Implement memory-safe programming with RAII, ownership, smart pointers, and resource management across Rust, C++, and C. Use when writing safe systems code, managing resources, or preventing memory...
azure-ai-contentsafety-ts
Analyze text and images for harmful content using Azure AI Content Safety (@azure-rest/ai-content-safety). Use when moderating user-generated content, detecting hate speech, violence, sexual conten...
azure-ai-contentsafety-py
Azure AI Content Safety SDK for Python. Use for detecting harmful content in text and images with multi-severity classification.
azure-ai-contentsafety-java
Build content moderation applications with Azure AI Content Safety SDK for Java. Use when implementing text/image analysis, blocklist management, or harm detection for hate, violence, sexual conten...
bgo
Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.
ui-ux-pro-max
UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 9 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind, shadcn/ui). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient. Integrations: Component search and examples.
ui ux
UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 9 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind, shadcn/ui). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient. Integrations: shadcn/ui MCP for component search and examples.
ui-ux-design
UI/UX design reference database. 50+ styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient.
ui-skills
Opinionated constraints for building better interfaces with agents.
ui-patterns
Plaited UI patterns for templates, behavioral elements, and styling. Use when creating bElements or FunctionalTemplates, writing stories for testing, using createStyles, building form controls, or coordinating cross-island communication.