DevOps & Infrastructure
CI/CD, deployment, Docker, Kubernetes, and cloud infrastructure.
Skills in this category
View all in directory →linux-shell-scripting
Provide production-ready shell script templates for common Linux system administration tasks including backups, monitoring, user management, log analysis, and automation. These scripts serve as building blocks for security operations and penetration testing environments.
iterate-pr
Iterate on a PR until CI passes. Use when you need to fix CI failures, address review feedback, or continuously push fixes until all checks are green. Automates the feedback-fix-push-wait cycle.
istio-traffic-management
Comprehensive guide to Istio traffic management for production service mesh deployments.
incident-runbook-templates
Production-ready templates for incident response runbooks covering detection, triage, mitigation, resolution, and communication.
incident-response-smart-fix
[Extended thinking: This workflow implements a sophisticated debugging and resolution pipeline that leverages AI-assisted debugging tools and observability platforms to systematically diagnose and res
incident-responder
Expert SRE incident responder specializing in rapid problem resolution, modern observability, and comprehensive incident management.
expo-cicd-workflows
Helps understand and write EAS workflow YAML files for Expo projects. Use this skill when the user asks about CI/CD or workflows in an Expo or EAS context, mentions .eas/workflows/, or wants help with EAS build pipelines or deployment automation.
error-diagnostics-error-trace
You are an error tracking and observability expert specializing in implementing comprehensive error monitoring solutions. Set up error tracking systems, configure alerts, implement structured logging,
error-debugging-error-trace
You are an error tracking and observability expert specializing in implementing comprehensive error monitoring solutions. Set up error tracking systems, configure alerts, implement structured logging, and ensure teams can quickly identify and resolve production issues.
error-debugging-error-analysis
You are an expert error analysis specialist with deep expertise in debugging distributed systems, analyzing production incidents, and implementing comprehensive observability solutions.
docker-expert
You are an advanced Docker containerization expert with comprehensive, practical knowledge of container optimization, security hardening, multi-stage builds, orchestration patterns, and production deployment strategies based on current industry best practices.
devops-troubleshooter
Expert DevOps troubleshooter specializing in rapid incident response, advanced debugging, and modern observability.
devops-deploy
DevOps e deploy de aplicacoes — Docker, CI/CD com GitHub Actions, AWS Lambda, SAM, Terraform, infraestrutura como codigo e monitoramento.
deployment-procedures
Production deployment principles and decision-making. Safe deployment workflows, rollback strategies, and verification. Teaches thinking, not scripts.
deployment-pipeline-design
Architecture patterns for multi-stage CI/CD pipelines with approval gates and deployment strategies.
bitbucket-automation
Automate Bitbucket repositories, pull requests, branches, issues, and workspace management via Rube MCP (Composio). Always search tools first for current schemas.
bash-pro
Master of defensive Bash scripting for production automation, CI/CD pipelines, and system utilities. Expert in safe, portable, and testable shell scripts.
kubernetes-deployment
Kubernetes deployment workflow for container orchestration, Helm charts, service mesh, and production-ready K8s configurations.
botlearn-healthcheck
botlearn-healthcheck — BotLearn autonomous health inspector for OpenClaw instances across 5 domains (hardware, config, security, skills, autonomy); triggers on system check, health report, diagnostics, or scheduled heartbeat inspection.
Incident Postmortem Generator
Generate blameless incident postmortems from raw notes, Slack threads, or bullet points.
Post-Mortem & Incident Review Framework
Run structured post-mortems that actually prevent repeat failures. Blameless analysis, root cause identification, and action tracking.
afrexai-performance-engineering
Complete performance engineering system — profiling, optimization, load testing, capacity planning, and performance culture. Use when diagnosing slow applications, optimizing code/queries/infrastructure, load testing before launch, planning capacity, or building performance into CI/CD. Covers Node.js, Python, Go, Java, databases, APIs, and frontend.
OpenClaw Mastery — The Complete Agent Engineering & Operations System
> Built by AfrexAI — the team that runs 9+ production agents 24/7 on OpenClaw.
Legacy System Modernization Engine
Complete methodology for assessing, planning, and executing legacy system modernization — from monolith decomposition to cloud migration. Works for any tech stack, any scale.
Incident Response Playbook
Structured incident response for business and IT teams. Guides you through detection, triage, containment, resolution, and post-mortem — with auto-generated timelines and action items.
Git Engineering & Repository Strategy
You are a Git Engineering expert. You help teams design branching strategies, implement code review workflows, manage monorepos, automate releases, and maintain healthy repository practices at scale.
Django Production Engineering
Complete methodology for building, scaling, and operating production Django applications. From project structure to deployment, security to performance — every decision framework a Django team needs.
IT Disaster Recovery Plan Generator
Build production-ready disaster recovery plans that actually get followed when things break.
afrexai-api-architect
Design, build, test, document, and secure production-grade APIs. Covers the full lifecycle from schema design through deployment, monitoring, and versioning. Use when designing new APIs, reviewing existing ones, generating OpenAPI specs, building test suites, or debugging production issues.
Agent Ops Runbook
Generate a production-ready operations runbook for deploying AI agents. Covers pre-deployment checklists, shadow mode → supervised → autonomous rollout stages, monitoring dashboards, rollback procedures, cost management, and incident response templates.
node-red-manager
Manage Node-RED instances via Admin API or CLI. Automate flow deployment, install nodes, and troubleshoot issues. Use when user wants to "build automation", "connect devices", or "fix node-red".
newman
Automated API testing with Postman collections via Newman CLI. Use when user requests API testing, collection execution, automated testing, CI/CD integration, or mentions "Postman", "Newman", "API tests", "run collection", or "automated testing".
cloudflare-manager
Manage Cloudflare DNS records, Tunnels (cloudflared), and Zero Trust policies. Use for pointing domains, exposing local services via tunnels, and updating ingress rules.
openclaw-safe-change-flow
Safe OpenClaw config change workflow with backup, minimal edits, validation, health checks, and rollback. Single-instance first; secondary instance optional.
jqopenclaw-node-invoker
统一通过 Gateway 的 node.invoke 调用 JQOpenClawNode 能力(file.read、file.write、process.exec、process.manage、system.run、process.which、system.info、system.screenshot、system.notify、system.clipboard、system.input、node.selfUpdate)。当用户需要远程文件读写、文件移动/删除、目录创建/删除、进程管理(列表/搜索/终止)、远程进程执行、命令可执行性探测、系统信息采集、截图采集、系统弹窗、系统剪贴板读写、输入控制(鼠标/键盘)、节点自更新、节点命令可用性排查或修复 node.invoke 参数错误时使用。
天翼云 DMS 数据管理服务 (客户端)
提供 DMS 客户端数据管理能力,包括实例管理、SQL查询、工单创建、团队管理、用户管理。用于用户请求数据库实例添加、数据查询、提交工单、团队配置等操作时调用。
github-tools
Interact with GitHub using the `gh` CLI. Use `gh issue`, `gh pr`, `gh run`, and `gh api` for issues, PRs, CI runs, and advanced queries.
GitMap Skill
Version control for ArcGIS web maps — exposed as native OpenClaw tools.
terminal-command-execution
Execute terminal commands safely and reliably with clear pre-checks, output validation, and recovery steps. Use when users ask to run shell/CLI commands, inspect system state, manage files, install dependencies, start services, debug command failures, or automate command-line workflows.
Nightly Build 🌙
An automation skill that runs maintenance tasks while you sleep and delivers a morning briefing.
virtualbox
Control and manage VirtualBox virtual machines directly from openclaw. Start, stop, snapshot, clone, configure and monitor VMs using VBoxManage CLI. Supports full lifecycle management including VM creation, network configuration, shared folders, and performance monitoring.
cursor2api
cursor2api proxy service management tool that converts Cursor IDE's free AI conversations into Anthropic Messages API / OpenAI Chat Completions API format. Supports Docker deployment, environment configuration, token refresh, and complete uninstallation. Use when user asks to: (1) Install or deploy cursor2api, (2) Configure cursor2api for OpenClaw/Claude Code/CC Switch, (3) Refresh or retrieve Cursor Session Token, (4) Uninstall cursor2api.
claude-to-im
Bridge THIS Claude Code or Codex session to Telegram, Discord, Feishu/Lark, QQ, or WeChat so the user can chat with Claude from their phone. Use for: setting up, starting, stopping, or diagnosing the claude-to-im bridge daemon; forwarding Claude replies to a messaging app; any phrase like "claude-to-im", "bridge", "消息推送", "消息转发", "桥接", "连上飞书", "手机上看claude", "启动后台服务", "诊断", "查看日志", "配置". Subcommands: setup, start, stop, status, logs, reconfigure, doctor. Do NOT use for: building standalone bots, webhook integrations, or coding with IM platform SDKs — those are regular programming tasks.
wt
Manage LlamaFarm worktrees for isolated parallel development. Create, start, stop, and clean up worktrees.
azure-quotas
Check/manage Azure quotas and usage across providers. For deployment planning, capacity validation, region selection. WHEN: "check quotas", "service limits", "current usage", "request quota increase", "quota exceeded", "validate capacity", "regional availability", "provisioning limits", "vCPU limit", "how many vCPUs available in my subscription".
modal-deployment
Run Python code in the cloud with serverless containers, GPUs, and autoscaling using Modal. This skill enables agents to generate code for deploying ML models, running batch jobs, serving APIs, and scaling compute-intensive workloads.
grail-miner
This skill assists in setting up, managing, and optimizing Grail miners on Bittensor Subnet 81, handling tasks like environment configuration, R2 storage, model checkpoint management, and performance tuning.
docker-local-dev
Generate Docker Compose and Dockerfile configurations for local development through interactive Q&A. Supports PHP/Laravel, WordPress, Drupal, Joomla, Node.js, and Python stacks with Nginx, Supervisor/PM2, databases, Redis, and email testing. Always asks clarifying questions before generating configurations.
SKILL: Azure Kusto Spark Connector — Release Process
## Identity
aspire
Aspire orchestration for cloud-native distributed applications in any language (C#, Python, Node.js, Go). Handles dependency management, local dev with Docker, Azure deployment, service discovery, and observability dashboards. Use when setting up microservices, containerized apps, or polyglot distributed systems.
tbdflow
Manage Trunk-Based Development workflows using the tbdflow CLI. Use this skill to create short-lived branches, make standardised commits, sync with trunk, merge completed work, and generate changelogs.
tilt
This AI agent skill empowers efficient management and monitoring of local development environments using Tilt. It allows for quick checks of deployment health, investigation of errors, retrieval of logs, and control over Tilt-managed resources.
vercel-github-actions-deploy
Set up GitHub Actions to deploy any Vercel project using the Git Author Override method, enabling teammates to deploy on the free Hobby plan. Use when the user asks about Vercel deployment via GitHub Actions, CI/CD for Vercel, letting teammates deploy on Vercel free plan, bypassing Vercel's Hobby plan deploy restrictions, or automating Vercel production deploys. Covers workflow setup, GitHub Secrets configuration, and package manager variants (bun, npm, pnpm).
gastown-operator
This skill enables an AI agent to orchestrate and manage Gas Town multi-agent systems on Kubernetes, using `kubectl-gt` commands to create, dispatch, and monitor resources like Rigs, Polecats, and Convoys.
code-mode
This skill guides an AI agent to help users implement 'code mode' on their MCP servers, enabling LLMs to write processing scripts for large API responses within a sandboxed runtime, significantly reducing token usage.
opentelemetry-skill
An AI skill transforming your agent into an expert Principal Observability Engineer, offering deep guidance on OpenTelemetry collector configuration, pipeline design, application instrumentation, and production observability architecture. It provides technically rigorous, production-ready advice for managing telemetry data at scale.
swe-cli-skills
Empower your AI agent with senior engineer CLI expertise, covering complex workflows, safety guardrails, common pitfalls, and anti-patterns across cloud, IaC, containers, databases, and development tools.
PicoClaw Fleet
Orchestrate a fleet of remote PicoClaw workers over SSH for fast, ephemeral one-shot tasks, enabling parallel execution and management of distributed AI workloads.
multiagent
Manage multi-agent Telegram group collaboration for OpenClaw, automating the process of adding, removing, and configuring AI agents within a team.
openclaw-ops
A comprehensive operations guide for OpenClaw, covering installation, configuration, model management, migration, cron tasks, security checks, and debugging.
Add Feishu Channel
This skill adds Feishu (Lark) channel support to the NanoClaw AI agent framework, integrating the messaging platform for AI agent interactions and allowing for deterministic code changes and interactive setup.
Popular guides for DevOps & Infrastructure
Explore curated landing pages built around the strongest intents connected to this category.
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Cursor vs Codex for AI Workflows
Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.
Top AI Agents for Productivity
See the top AI agent skills for productivity, workflow automation, operational systems, documentation, and everyday task execution.