managing-deployment-rollbacks

Deploy use when you need to work with deployment and CI/CD. This skill provides deployment automation and orchestration with comprehensive guidance and automation. Trigger with phrases like "deploy application", "create pipeline", or "automate deployment".

1,868 stars

Best use case

managing-deployment-rollbacks is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Deploy use when you need to work with deployment and CI/CD. This skill provides deployment automation and orchestration with comprehensive guidance and automation. Trigger with phrases like "deploy application", "create pipeline", or "automate deployment".

Teams using managing-deployment-rollbacks should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/managing-deployment-rollbacks/SKILL.md --create-dirs "https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/main/plugins/devops/deployment-rollback-manager/skills/managing-deployment-rollbacks/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/managing-deployment-rollbacks/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How managing-deployment-rollbacks Compares

Feature / Agentmanaging-deployment-rollbacksStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Deploy use when you need to work with deployment and CI/CD. This skill provides deployment automation and orchestration with comprehensive guidance and automation. Trigger with phrases like "deploy application", "create pipeline", or "automate deployment".

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Managing Deployment Rollbacks

## Overview

Implement and execute deployment rollback procedures for Kubernetes, ECS, Lambda, and cloud VM deployments. Detect failed deployments via health checks and error rate monitoring, then automatically or manually revert to the last known good version with minimal downtime and data integrity preservation.

## Prerequisites

- `kubectl` configured with cluster access and permission to manage deployments
- Deployment history retained (Kubernetes `revisionHistoryLimit`, ECS task definition versions)
- Monitoring system tracking error rate, latency, and health check status (Prometheus, Datadog, CloudWatch)
- Previous deployment artifacts (container images, task definitions) still available in the registry
- Database migration strategy that supports backward compatibility (expand-contract pattern)

## Instructions

1. Detect deployment failure: monitor error rate, P99 latency, pod restart count, and health check responses for 5-10 minutes post-deploy
2. Assess rollback scope: determine if the issue is application code, configuration, or infrastructure
3. For Kubernetes: execute `kubectl rollout undo deployment/<name>` to revert to the previous revision
4. For ECS: update the service to use the previous task definition revision
5. For Lambda: point the alias back to the previous function version
6. Verify database compatibility: ensure the previous application version works with the current database schema (no forward-only migrations were applied)
7. Confirm rollback success: verify health checks pass, error rate returns to baseline, and user-facing functionality is restored
8. Generate a post-incident report: document what failed, when rollback was triggered, time to recovery, and root cause
9. Create automated rollback rules: configure Kubernetes readiness probes, Argo Rollouts analysis, or CloudWatch alarms to trigger rollback without manual intervention

## Output

- Rollback scripts for each deployment target (Kubernetes, ECS, Lambda)
- Automated rollback configuration (Kubernetes probes, Argo Rollouts AnalysisTemplate)
- Post-incident report template with timeline and root cause sections
- Monitoring dashboard with rollback trigger indicators
- Database migration rollback procedures (down migrations, backward-compatible schemas)

## Error Handling

| Error | Cause | Solution |
|-------|-------|---------|
| `no rollout history found` | Revision history limit set to 0 or deployment was created fresh | Increase `revisionHistoryLimit` in deployment spec; manually specify the target image tag |
| `Rollback succeeded but errors persist` | Issue is in configuration or external dependency, not application code | Check ConfigMaps, Secrets, and external service health; rollback configuration changes separately |
| `Database schema incompatible after rollback` | Forward-only migration applied during failed deployment | Apply a down migration or use expand-contract pattern; never deploy breaking schema changes alongside code |
| `Old image no longer in registry` | Lifecycle policy deleted the previous image | Restore from backup or rebuild from the git tag; extend image retention for production tags |
| `Rollback causes service disruption` | Insufficient replicas during rollback transition | Set `maxUnavailable: 0` in rolling update strategy to ensure zero-downtime rollback |

## Examples

- "Roll back the production Kubernetes deployment to the previous revision after detecting a spike in 5xx errors."
- "Create an automated rollback policy using Argo Rollouts that reverts if error rate exceeds 1% during the first 10 minutes after deploy."
- "Generate a rollback runbook for an ECS service that includes steps to revert task definition, validate health, and notify the team via Slack."

## Resources

- Kubernetes rollout management: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-back-a-deployment
- Argo Rollouts analysis: https://argoproj.github.io/argo-rollouts/features/analysis/
- AWS ECS rolling updates: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/deployment-type-ecs.html
- Database migration patterns: https://martinfowler.com/articles/evodb.html

Related Skills

managing-test-environments

1868
from jeremylongshore/claude-code-plugins-plus-skills

Test provision and manage isolated test environments with configuration and data. Use when performing specialized testing. Trigger with phrases like "manage test environment", "provision test env", or "setup test infrastructure".

managing-snapshot-tests

1868
from jeremylongshore/claude-code-plugins-plus-skills

Create and validate component snapshots for UI regression testing. Use when performing specialized testing. Trigger with phrases like "update snapshots", "test UI snapshots", or "validate component snapshots".

managing-database-tests

1868
from jeremylongshore/claude-code-plugins-plus-skills

Test database testing including fixtures, transactions, and rollback management. Use when performing specialized testing. Trigger with phrases like "test the database", "run database tests", or "validate data integrity".

managing-ssltls-certificates

1868
from jeremylongshore/claude-code-plugins-plus-skills

Execute this skill enables AI assistant to manage and monitor ssl/tls certificates using the ssl-certificate-manager plugin. it is activated when the user requests actions related to ssl certificates, such as checking certificate expiry, renewing certificates, ... Use when appropriate context detected. Trigger with relevant phrases based on skill purpose.

managing-autonomous-development

1868
from jeremylongshore/claude-code-plugins-plus-skills

Execute enables AI assistant to manage sugar's autonomous development workflows. it allows AI assistant to create tasks, view the status of the system, review pending tasks, and start autonomous execution mode. use this skill when the user asks to create a new develo... Use when appropriate context detected. Trigger with relevant phrases based on skill purpose.

managing-network-policies

1868
from jeremylongshore/claude-code-plugins-plus-skills

Execute use when managing Kubernetes network policies and firewall rules. Trigger with phrases like "create network policy", "configure firewall rules", "restrict pod communication", or "setup ingress/egress rules". Generates Kubernetes NetworkPolicy manifests following least privilege and zero-trust principles.

creating-kubernetes-deployments

1868
from jeremylongshore/claude-code-plugins-plus-skills

Deploy applications to Kubernetes with production-ready manifests. Supports Deployments, Services, Ingress, HPA, ConfigMaps, Secrets, StatefulSets, and NetworkPolicies. Includes health checks, resource limits, auto-scaling, and TLS termination. Use when working with creating kubernetes deployments. Trigger with 'creating', 'kubernetes', 'deployments'.

managing-environment-configurations

1868
from jeremylongshore/claude-code-plugins-plus-skills

Implement environment and configuration management with comprehensive guidance and automation. Use when you need to work with environment configuration. Trigger with phrases like "manage environments", "configure environments", or "sync configurations".

orchestrating-deployment-pipelines

1868
from jeremylongshore/claude-code-plugins-plus-skills

Deploy use when you need to work with deployment and CI/CD. This skill provides deployment automation and orchestration with comprehensive guidance and automation. Trigger with phrases like "deploy application", "create pipeline", or "automate deployment".

managing-container-registries

1868
from jeremylongshore/claude-code-plugins-plus-skills

Execute use when you need to work with containerization. This skill provides container management and orchestration with comprehensive guidance and automation. Trigger with phrases like "containerize app", "manage containers", or "orchestrate deployment".

managing-database-sharding

1868
from jeremylongshore/claude-code-plugins-plus-skills

Process use when you need to work with database sharding. This skill provides horizontal sharding strategies with comprehensive guidance and automation. Trigger with phrases like "implement sharding", "shard database", or "distribute data".

managing-database-replication

1868
from jeremylongshore/claude-code-plugins-plus-skills

Process use when you need to work with database scalability. This skill provides replication and sharding with comprehensive guidance and automation. Trigger with phrases like "set up replication", "implement sharding", or "scale database".