agent-safety

Ensure agent safety - guardrails, content filtering, monitoring, and compliance

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

agent-safety is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Ensure agent safety - guardrails, content filtering, monitoring, and compliance

Teams using agent-safety should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/agent-safety/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/ai-agents/agent-safety/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/agent-safety/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How agent-safety Compares

Feature / Agent	agent-safety	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Ensure agent safety - guardrails, content filtering, monitoring, and compliance

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Agent Safety

Implement safety systems for responsible AI agent deployment.

## When to Use This Skill

Invoke this skill when:
- Adding input/output guardrails
- Implementing content filtering
- Setting up rate limiting
- Ensuring compliance (GDPR, SOC2)

## Parameter Schema

| Parameter | Type | Required | Description | Default |
|-----------|------|----------|-------------|---------|
| `task` | string | Yes | Safety goal | - |
| `risk_level` | enum | No | `strict`, `moderate`, `permissive` | `strict` |
| `filters` | list | No | Filter types to enable | `["injection", "pii", "toxicity"]` |

## Quick Start

```python
from guardrails import Guard
from guardrails.validators import ToxicLanguage, PIIFilter

guard = Guard.from_validators([
    ToxicLanguage(threshold=0.8, on_fail="exception"),
    PIIFilter(on_fail="fix")
])

# Validate output
validated = guard.validate(llm_response)
```

## Guardrail Types

### Input Guardrails
```python
# Prompt injection detection
INJECTION_PATTERNS = [
    r"ignore (previous|all) instructions",
    r"you are now",
    r"forget everything"
]
```

### Output Guardrails
```python
# Content filtering
filters = [
    ToxicityFilter(),
    PIIRedactor(),
    HallucinationDetector()
]
```

## Rate Limiting

```python
class RateLimiter:
    def __init__(self, rpm=60, tpm=100000):
        self.rpm = rpm
        self.tpm = tpm

    def check(self, user_id, tokens):
        # Token bucket algorithm
        pass
```

## Troubleshooting

| Issue | Solution |
|-------|----------|
| False positives | Tune thresholds |
| Injection bypass | Add LLM-based detection |
| PII leakage | Add secondary validation |
| Performance hit | Cache filter results |

## Best Practices

- Defense in depth (multiple layers)
- Fail-safe defaults (deny by default)
- Audit everything
- Regular red team testing

## Compliance Checklist

- [ ] Input validation active
- [ ] Output filtering enabled
- [ ] Audit logging configured
- [ ] Rate limits set
- [ ] PII handling compliant

## Related Skills

- `tool-calling` - Input validation
- `llm-integration` - API security
- `multi-agent` - Per-agent permissions

## References

- [OWASP LLM Top 10](https://owasp.org/www-project-top-10-for-large-language-model-applications/)
- [Guardrails AI](https://docs.guardrailsai.com/)

Related Skills

type-safety-validation

from diegosouzapw/awesome-omni-skill

End-to-end type safety with Zod, tRPC, Prisma, and TypeScript 5.7+ patterns. Use when creating Zod schemas, setting up tRPC, validating input, implementing exhaustive switch statements, branded types, or type checking with ty.

safety

from diegosouzapw/awesome-omni-skill

Git, command, Kubernetes, data, workspace, and temporary files safety rules. Use when committing, pushing, using kubectl, handling multi-repo workspaces, or performing destructive operations.

memory-safety-patterns

from diegosouzapw/awesome-omni-skill

Implement memory-safe programming with RAII, ownership, smart pointers, and resource management across Rust, C++, and C. Use when writing safe systems code, managing resources, or preventing memory...

azure-ai-contentsafety-ts

from diegosouzapw/awesome-omni-skill

Analyze text and images for harmful content using Azure AI Content Safety (@azure-rest/ai-content-safety). Use when moderating user-generated content, detecting hate speech, violence, sexual conten...

azure-ai-contentsafety-py

from diegosouzapw/awesome-omni-skill

Azure AI Content Safety SDK for Python. Use for detecting harmful content in text and images with multi-severity classification.

azure-ai-contentsafety-java

from diegosouzapw/awesome-omni-skill

Build content moderation applications with Azure AI Content Safety SDK for Java. Use when implementing text/image analysis, blocklist management, or harm detection for hate, violence, sexual conten...

bgo

from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development

ui-ux-pro-max

from diegosouzapw/awesome-omni-skill

UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 9 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind, shadcn/ui). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient. Integrations: Component search and examples.

ui ux

from diegosouzapw/awesome-omni-skill

ui-ux-design

from diegosouzapw/awesome-omni-skill

UI/UX design reference database. 50+ styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient.

ui-skills

from diegosouzapw/awesome-omni-skill

Opinionated constraints for building better interfaces with agents.

ui-patterns

from diegosouzapw/awesome-omni-skill

Plaited UI patterns for templates, behavioral elements, and styling. Use when creating bElements or FunctionalTemplates, writing stories for testing, using createStyles, building form controls, or coordinating cross-island communication.