managing-deployment-rollbacks
Deploy use when you need to work with deployment and CI/CD. This skill provides deployment automation and orchestration with comprehensive guidance and automation. Trigger with phrases like "deploy application", "create pipeline", or "automate deployment".
Best use case
managing-deployment-rollbacks is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Deploy use when you need to work with deployment and CI/CD. This skill provides deployment automation and orchestration with comprehensive guidance and automation. Trigger with phrases like "deploy application", "create pipeline", or "automate deployment".
Teams using managing-deployment-rollbacks should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/managing-deployment-rollbacks/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How managing-deployment-rollbacks Compares
| Feature / Agent | managing-deployment-rollbacks | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Deploy use when you need to work with deployment and CI/CD. This skill provides deployment automation and orchestration with comprehensive guidance and automation. Trigger with phrases like "deploy application", "create pipeline", or "automate deployment".
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Managing Deployment Rollbacks ## Overview Implement and execute deployment rollback procedures for Kubernetes, ECS, Lambda, and cloud VM deployments. Detect failed deployments via health checks and error rate monitoring, then automatically or manually revert to the last known good version with minimal downtime and data integrity preservation. ## Prerequisites - `kubectl` configured with cluster access and permission to manage deployments - Deployment history retained (Kubernetes `revisionHistoryLimit`, ECS task definition versions) - Monitoring system tracking error rate, latency, and health check status (Prometheus, Datadog, CloudWatch) - Previous deployment artifacts (container images, task definitions) still available in the registry - Database migration strategy that supports backward compatibility (expand-contract pattern) ## Instructions 1. Detect deployment failure: monitor error rate, P99 latency, pod restart count, and health check responses for 5-10 minutes post-deploy 2. Assess rollback scope: determine if the issue is application code, configuration, or infrastructure 3. For Kubernetes: execute `kubectl rollout undo deployment/<name>` to revert to the previous revision 4. For ECS: update the service to use the previous task definition revision 5. For Lambda: point the alias back to the previous function version 6. Verify database compatibility: ensure the previous application version works with the current database schema (no forward-only migrations were applied) 7. Confirm rollback success: verify health checks pass, error rate returns to baseline, and user-facing functionality is restored 8. Generate a post-incident report: document what failed, when rollback was triggered, time to recovery, and root cause 9. Create automated rollback rules: configure Kubernetes readiness probes, Argo Rollouts analysis, or CloudWatch alarms to trigger rollback without manual intervention ## Output - Rollback scripts for each deployment target (Kubernetes, ECS, Lambda) - Automated rollback configuration (Kubernetes probes, Argo Rollouts AnalysisTemplate) - Post-incident report template with timeline and root cause sections - Monitoring dashboard with rollback trigger indicators - Database migration rollback procedures (down migrations, backward-compatible schemas) ## Error Handling | Error | Cause | Solution | |-------|-------|---------| | `no rollout history found` | Revision history limit set to 0 or deployment was created fresh | Increase `revisionHistoryLimit` in deployment spec; manually specify the target image tag | | `Rollback succeeded but errors persist` | Issue is in configuration or external dependency, not application code | Check ConfigMaps, Secrets, and external service health; rollback configuration changes separately | | `Database schema incompatible after rollback` | Forward-only migration applied during failed deployment | Apply a down migration or use expand-contract pattern; never deploy breaking schema changes alongside code | | `Old image no longer in registry` | Lifecycle policy deleted the previous image | Restore from backup or rebuild from the git tag; extend image retention for production tags | | `Rollback causes service disruption` | Insufficient replicas during rollback transition | Set `maxUnavailable: 0` in rolling update strategy to ensure zero-downtime rollback | ## Examples - "Roll back the production Kubernetes deployment to the previous revision after detecting a spike in 5xx errors." - "Create an automated rollback policy using Argo Rollouts that reverts if error rate exceeds 1% during the first 10 minutes after deploy." - "Generate a rollback runbook for an ECS service that includes steps to revert task definition, validate health, and notify the team via Slack." ## Resources - Kubernetes rollout management: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-back-a-deployment - Argo Rollouts analysis: https://argoproj.github.io/argo-rollouts/features/analysis/ - AWS ECS rolling updates: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/deployment-type-ecs.html - Database migration patterns: https://martinfowler.com/articles/evodb.html
Related Skills
managing-test-environments
This skill enables Claude to manage isolated test environments using Docker Compose, Testcontainers, and environment variables. It is used to create consistent, reproducible testing environments for software projects. Claude should use this skill when the user needs to set up a test environment with specific configurations, manage Docker Compose files for test infrastructure, set up programmatic container management with Testcontainers, manage environment variables for tests, or ensure cleanup after tests. Trigger terms include "test environment", "docker compose", "testcontainers", "environment variables", "isolated environment", "env-setup", and "test setup".
managing-autonomous-development
Enables Claude to manage Sugar's autonomous development workflows. It allows Claude to create tasks, view the status of the system, review pending tasks, and start autonomous execution mode. Use this skill when the user asks to create a new development task using `/sugar-task`, check the system status with `/sugar-status`, review pending tasks via `/sugar-review`, or initiate autonomous development using `/sugar-run`. It provides a comprehensive interface for interacting with the Sugar autonomous development system.
managing-ssltls-certificates
This skill enables Claude to manage and monitor SSL/TLS certificates using the ssl-certificate-manager plugin. It is activated when the user requests actions related to SSL certificates, such as checking certificate expiry, renewing certificates, or listing installed certificates. Use this skill when the user mentions "SSL certificate", "TLS certificate", "certificate expiry", "renew certificate", or similar phrases related to SSL/TLS certificate management. The plugin can list, check, and renew certificates, providing vital information for maintaining secure connections.
managing-snapshot-tests
This skill enables Claude to manage and update snapshot tests using intelligent diff analysis and selective updates. It is triggered when the user asks to analyze snapshot failures, update snapshots, or manage snapshot tests in general. It helps distinguish intentional changes from regressions, selectively update snapshots, and validate snapshot integrity. Use this when the user mentions "snapshot tests", "update snapshots", "snapshot failures", or requests to run "/snapshot-manager" or "/sm". It supports Jest, Vitest, Playwright, and Storybook frameworks.
orchestrating-deployment-pipelines
Deploy use when you need to work with deployment and CI/CD. This skill provides deployment automation and orchestration with comprehensive guidance and automation. Trigger with phrases like "deploy application", "create pipeline", or "automate deployment".
managing-network-policies
This skill enables Claude to manage Kubernetes network policies and firewall rules. It allows Claude to generate configurations and setup code based on specific requirements and infrastructure. Use this skill when the user requests to create, modify, or analyze network policies for Kubernetes, or when the user mentions "network-policy", "firewall rules", or "Kubernetes security". This skill is useful for implementing best practices and production-ready configurations for network security in a Kubernetes environment.
managing-environment-configurations
Implement environment and configuration management with comprehensive guidance and automation. Use when you need to work with environment configuration. Trigger with phrases like "manage environments", "configure environments", or "sync configurations".
managing-database-sharding
Process use when you need to work with database sharding. This skill provides horizontal sharding strategies with comprehensive guidance and automation. Trigger with phrases like "implement sharding", "shard database", or "distribute data".
managing-database-replication
Process use when you need to work with database scalability. This skill provides replication and sharding with comprehensive guidance and automation. Trigger with phrases like "set up replication", "implement sharding", or "scale database".
managing-database-recovery
Process use when you need to work with database operations. This skill provides database management and optimization with comprehensive guidance and automation. Trigger with phrases like "manage database", "optimize database", or "configure database".
managing-database-partitions
Process use when you need to work with database partitioning. This skill provides table partitioning strategies with comprehensive guidance and automation. Trigger with phrases like "partition tables", "implement partitioning", or "optimize large tables".
managing-database-migrations
Process use when you need to work with database migrations. This skill provides schema migration management with comprehensive guidance and automation. Trigger with phrases like "create migration", "run migrations", or "manage schema versions".